PARAMETER 

ESTIMATION 

IN ENGINEERING 
AND SCIENCE 


JAMES V. BECK 

Department of Mechanical Engineering and 
Division of Engineering Research 
Michigan State University 
and 

KENNETH J. ARNOLD 

Department of Statistics and Probability 
Michigan State University 


John Wiley & Sons 

New York London Sydney Toronto 



TtBM LIBRARY 



Copyngbt C 1977 by iobn Wiky ft Sou. Ibc 
A ll ngbu r«i«rv«cl Published tunulUBeously m Casad 
No pan o{ ihis book say b< ttptoduced by asy stae 
sor (rauoucted. nor oauUted imo a machice Uii|ua{ 
wi(hou( the wmcen permissioa of the publisher 

Library of Conp^u Caulogag wt FubbcaoD'i Data 
Beck, /uses V 1930. 

Parameter estmaiioo id eDgineenDS aad science 
(Wiley senes id probabiliiy aod mathematical 
statisacs) 

lacludes b bbographical reference* and index 
1 Engmeenog^taiul cal methods 
2 EstunaUoa tbeoiy I Arnold, Kenseth 1 
1914- joiDlantbor II Title 
TA340 B39 620'00l 51954 77-635 

ISBN W71-06n»-2 

Printed m the United States of Amencaa 
10 987654321 



to BARBARA 

, and PAULINE 
\ • 



Preface 


Parameter estimation is a modern and exciting discipline that is growing 
rapidly. One of the indications that it is a young field of study is that the 
name of the discipline and many of its terms have not yet been agreed 
upon; other names for parameter estimation are nonlinear estimation, 
nonlinear regression, optimization of parameters, calibration, model build- 
ing, identification, and identification of systems. 

The objectives of this book are to provide (1) methods for estimating 
constants (i.e., parameters) appearing in mathematical models, (2) esti- 
mates of the accuracy of the estimated parameters, and (3) tools and 
insights for developing improved mathematical models. The book presents 
methods that can use all the available statistical information. We would 
say that the three objectives cover the main problems in parameter 
estimation. Some of the names mentioned above emphasize certain of 
them; for example, model building and identification emphasize the third 
objective. 

The book has been used in printed note form for five years and has 
undergone continual revision and, we hope, improvement. The first author 
initially wrote all eight chapters. Chapters 2 and 3 later being replaced by 
those written by the second author. Both authors, however, contributed to 
all the chapters. 


vii 



PREFACE 


tOI 

Several different courses can be taught from the book A beginning 
course at the undergraduate level could be taken from the first five 
chapters with the emphasis on Chapter 5, which gives an algebraic treat- 
ment of linear parameter estimation Chapters 2 and 3 provide a concise 
treatment of the required stabstical background so that the student would 
find the course easier if he had previously taken a probability or statistics 
course A graduate course would include Chapters 1, 6 , 7, and 8 with the 
matrix approach being used freely Ideally, students would have a back- 
ground m elementary probability, statistics, and matrices for such a course 
Because of the first author’s background m heat transfer, many of the 
examples are taken from this held but knowledge of this field or any other 
particular field in engineering or science is not required 

This book has a number of unique aspects One of these relates to the 
care in treatment of statistical assumptions Eight standard assumptions 
are identified, and a numbering system is given to provide a convenient 
means of designating them These assumptions are listed m Sections 5 1 3 
and 6 1 5, with an abbreviated form inside the back cover 

There arc many people who have helped in the preparation of this text 
to whom we express our appreciation ITie notes in various revisions were 
typed by Mrs Noralee Burkhardt, Ms Roberta Smith, Miss Gloria 
Mannino, Mrs Kathy Winnie, Mrs Barbara Coolen, Miss Jan Furtaw, 
Mrs Constance Ceraci, Miss Patti Truax, Mrs Pat Holewmski Mrs 
Brenda Stott, and Ms Mary Beth Mac Donell The encouragement of the 
editor, Ms Beatrice Shube, is particularly appreciated Some of the results 
given in this book are related to research done by the first author and 
supported by National Science Foundation Grants GK-16526, GK-240. 
GK 2075, and GK.41495 


James V Beck 
Kenneth J Arnold 


Dectmber 1976 
East Lansing Michigan 



Contents 


Chapter 1 Introduction to and Survey of Parameter Estimation 1 

1.1 INTRODUCTION, 1 

1.1.1 Parameters, Properties, and States, 2 

1.1.2 Purpose of This Chapter, 3 

1.1.3 Related Research, 4 

1.1.4 Relation to Analytical Design Theory, 5 

1.2 FUNDAMENTAL PROBLEMS, 5 

1.2.1 Deterministic or Classical Problem, 5 

1.2.2 State Estimation Problem, 6 

1.2.3 Parameter Estimation Problem, 7 

1.2.4 Optimum Experiment Problem, 7 

1.2.5 Discrimination Problem, 8 

1.2.6 Identification Problem, 8 

1.3 SIMPLE EXAMPLES, 8 

1.3.1 Linear Algebraic Model, 8 

1.3.2 Linear First Order Differential Equation Model, 12 

1.3.3 Partial Differential Equation Example, 16 

1.4 SENSITIVITY COEFFICIENTS, 17 



CONTENTS 


1 5 TDENTIFIABILITY, 19 

1 6 SUMMARY AND CONCLUSIONS, 23 
REFERENCES, 24 

PROBLEMS, 24 

Chapter 2 Probability 29 

2 1 RANDOM HAPPENINGS, 29 
2 2 EVENTS, 32 

2 2 1 Events Random Variables and 32 

2 2 2 Disciete and Continuous Sample Spaces and 
Associated Probabilities, 34 

2 2 3 Assigned Probabilities and Experience with Chance 
Events. 36 

2 3 PROBABIUTY DISTRIBUTtONS, 36 

2 3 1 Univanate Probability Distributions Distribution 
Functions, 36 

2 MuUivanate Distnbuuons, 39 
2 3 3 Sample Paths, 42 

2 4 CONDITIONAL PROBABILITIES, 43 

2 4 1 Conditional Distributions Discrete Case, 43 
24 2 Conditional Distributions Continuous Case, 44 
2 4 3 Bayes’s Theorem, 46 

2 5 FUNCTIONS OF RANDOM VARIABLES, 48 
2 6 EXPECTATIONS, 51 

26 1 Expected Value, 51 

2 62 Vanaoce, Covariance, and Correlation, 56 

26 3 Stochastic Processes Autocovanance, Cross-covati- 

ance, 59 

2 6 4 Stationariiy, 61 

2 7 LAW OF LARGE NUMBERS CENTRAL LIMIT THEOREM. 62 
2 7 1 Chebyshev’s Inequali^, 62 

27 2 Weak Law of Large Numbers, 63 
27 3 Central Limit Theorem, 64 

2 8 EXAMPLES OF DlSTRIBVnONS, 65 

2 8 1 Bernoulli Distnbutions, 6S 
2 8 2 Binomial Distnbutiom, 65 



xi 


CONTENTS 


2.8.3 

Poisson Distributions, 66 

2.8.4 

Uniform Distributions, 67 

2.8.5 

Normal Distributions, 67 

2.8.6 

Multivariate Normal Distributions, 71 

2.8.7 

Gamma Distributions, 72 

2.8.8 

Chi-squared Distributions, 73 

2.8.9 

t Distributions, 75 

2.8.10 

F Distributions, 76 

2.8.11 

Tables and Computer Programs for Commonly used 
Statistics, 77 


REFERENCES, 77 

PROBLEMS, 78 

Chapter 3 Introduction to Statistics 

3.1 SOME EXAMPLES OF ESTIMATORS, 85 

3.1.1 Two Estimators of the Center of a Symmetric Dis- 
tribution, 85 

3.1.2 Estimating a Variance, 87 

3.2 PROPERTIES OF ESTIMATORS, 89 

3.2.1 Unbiasedness, 89 

3.2.2 Consistency, 90 

3.2.3 Efficiency. Minimum Variance Unbiased Estimators, 
91 

3.2.4 Sufficiency, 93 

3.2.5 Maximum Likelihood Estimators, 94 

3.2.6 Estimators a posteriori, 97 

3.2.7 Bayes Squared Error Loss Estimators. MAP Estima- 
tors, 98 

3.2.8 Bayes Intervals, 100 

3.2.9 Minimizing Expected Cost, 101 

3.3 CONFIDENCE INTERVALS, 102 

3.3.1 Confidence Intervals for the Mean of a Normal 
Population when the Population Standard Deviation 
is Known, 102 

3.3.2 Confidence Intervals for the Standard Deviation of a 
Normal Population, 105 

3.3.3 Confidence Intervals for the Mean of a Normal 
Population when the Population Standard Deviation 
is Unknown, 106 


84 



CONTENTS 


3 4 HYPOTHESIS TESTING, 108 

3 4 1 Two Simple Hypotheses, 109 
3 4 2 Problems Reducible to Problems of Two Simple 
Hypotheses, 111 

3 43 Generalized Likebhood Ratio Tests Power, 1 12 
REFERENCES, 114 
PROBLEMS, 114 

Chapter 4 Parameter estimation Mettiods 11? 

4 1 ISTRODUCTION, 117 

42 RELATIONS BETWEEN OBSERVED RANDOM VARIABLES AND 
PARAMETERS, 117 

4 3 EXPECTED VALUES, VARJANCES, COVARIANCES, 120 
4 4 LINEAR PROBLEMS. 120 

4 5 LEAST SQUARES, 120 

4 6 OAUSS-MARKOV ESTIMATION, 121 

4 7 SOME OTHER ESTIMATORS, 122 

4 8 COST, 125 

4 9 MONTE CARLO METHODS, 125 

REFERENCES 129 

Chapter S Introduction to Linear Estimation 130 

5 1 MOTIVATION, MODELS, AND ASSUMPTIONS, 130 

5 1 1 Motivation, 130 
5 12 Models, 131 

5 13 Statistical Assumptions Regarding the Measurement 
Errors, 134 

5 2 ORDINARY LEAST SQUARES ESTIMATION (OLS), 135 

5 2 1 Models I and 2 (H“i8o and 135 

3 2 11 Mean and Vanances of Estimates, 136 
52 12 Expected Value of 137 
5 2 2 Two-Parameter Models, 140 

522 1 ModeI5.rj,=P,Jir„ + ft''^i2 '40 
5 22 2 Model 3, rj,= 143 



CONTENTS 


xiii 


5.2.23 Estimators for Model 4, tj,. = jSqH- = X), 
149 

5. 2.2.4 Optimal Experiments for Models 3 and 4, 
149 

5.2.3 Comments Regarding Definitions, 151 

5.3 MAXIMUM LIKELIHOOD (ML) ESTIMATION, 154 

5.3.1 One-Parameter Cases, 155 

5.3.2 Two-Parameter Cases, 156 

5.3.3 Estimating Using Maximum Likelihood, 157 

5.3.4 Maximum Likelihood Estimation Using Information 
from Prior Experiments, 158 

5.4 MAXIMUM A POSTERIORI (MAP) ESTIMATION, 159 

5.4.1 Random Parameter Case, 159 

5.4.2 Subjective Prior Information, 162 

5.4.3 Comparison of Viewpoints, 165 

5.5 MULTIPLE DATA POINTS, 167 

5.5.1 Sum of Squares, 168 

5.5.2 Parameter Estimates, 170 

5.6 COEFFICIENT OF MULTIPLE DETERMINATION (R^), 173 

5.7 ANALYSIS OF VARIANCE ABOUT THE SAMPLE MEAN, 175 

5.8 ANALYSIS OF VARIANCE ABOUT THE REGRESSION LINE FOR MUL- 
TIPLE MEASUREMENTS AT EACH Xj, 178 

5.8.1 Expected Values of for Incorrect Model, 180 

5.8.2 P-Test with Repeated Data, 181 

5.9 CONFIDENCE INTERVAL ABOUT THE POINTS ON THE REGRESSION 
LINE, 184 

5.10 VIOLATION OF THE STANDARD ASSUMPTION OF ZERO MEAN 
ERRORS, 185 

5.11 VIOLATION OF THE STANDARD ASSUMPTION OF NORMALITY, 186 

5.12 VIOLATION OF THE STANDARD ASSUMPTION OF CONSTANT VARI- 
ANCE, 188 

5.12.1 Variance of e,- Given by of = {Xi/S)V, 189 

5.12.2 Variance of e,- Equal to 190 

5.13 VIOLATION OF STANDARD ASSUMPTION OF UNCORRELATED 
ERRORS, 190 



CONTENTS 


5 14 ERRORS IN INDEPENDENT AND DEPENDENT VARIABLES, 192 

5 14 1 Method of Lagrange Multipliers. 192 
5 14 2 Problem of Errors m the Independent and Depen- 
dent Variables, 195 

5 143 Model 2 (ti, = PS,) Example with Errors in both tj. 

and i,, 198 
REFERENCES, 204 
PROBLEMS, 205 

Chapter 6 Matrix Analysis lor Unear Parameter Estimation 213 

6 1 INTRODUCTION TO MATRIX NOTATION AND OPERATIONS, 213 

6 1 1 Elementary Matrix Operations, 213 

6 111 Product of Matrices. 214 
6 112 Transpose of Matrix, 215 
6 113 Inverse, Determinant and Eigenvalues 215 
6114 Pattmoned Matrix, 218 
6 115 Positive Definite Mainces, 218 
6 116 Trace, 219 
6 1 2 Matrix Calculus, 219 
6 1 3 Quadratic Form, 221 

6 1 4 Expected Value of Matnx and Vanance-Covanance 
Matrix, 222 

6 14 1 Expected Value Matrix 222 
6142 Variance Covanance Matrix, 222 
6 143 Covariance of Linear Combination of Vec 
lor Random Vanables, 223 
6 14 4 Expected Value of a Quadratic Form, 224 

615 Model in Matnx Terms, 225 

6 15 1 Identifiability Condition 228 
6 15 2 Assumptions, 228 

616 Maximum Likelihood Sum of Squares Functions 230 
6 16 1 Single Dependent Variable (Single Re- 
sponse) Case, 230 

6 162 Several Dependent Vanables (Multire- 
sponse) Case, 231 

6 1 7 Gauss Markov Theorem 232 
6 2 LEAST SQUARES ESTIMATION, 234 

6 2 1 Ordinary Least Squares Estimator (OLS), 234 

6 2 2 Mean of the OLS Estimator, 238 

62 3 Vanance-Covanance Matnx of 238 



CONTENTS 


6.2.4 Relations Involving the Sum of Squares of Residuals, 
240 

6.2.5 Distributions of Rls '^ls* ^41 

6.2.6 Weighted Least Squares (WLS), 247 

6.3 ORTHOGONAL POLYNOMIALS IN OLS ESTIMATION, 248 

6.4 FACTORIAL EXPERIMENTS, 252 

6.4.1 Introduction, 252 

6.4.2 Two-Level Factorial Design, 253 

6.4.3 Coding the Factors, 254 

6.4.4 Inclusion of Interaction Terms in the Model, 255 

6.4.5 Estimation, 255 

6.4.6 Importance of Replicates, 258 

6.4.7 Other Experiment Designs, 258 

6.5 MAXIMUM LIKELIHOOD ESTIMATOR, 259 

6.5.1 ML Estimation, 259 

6.5.2 Estimation of a^, 262 

6.5.3 Expected Values of -^ml> 267 

6.6 LINEAR MAXIMUM A POSTERIORI ESTIMATOR (MAP), 269 

6.6.1 Introduction, 269 

6.6.2 Assumptions, 270 

6.6.3 Estimation Involving Random Parameters, 270 

6.6.4 Estimation with Subjective Information, 272 

6.6.5 Uncertainty in \f/, 273 

6.7 SEQUENTIAL ESTIMATION, 275 

6.7.1 Introduction, 275 

6.7.2 Direct Method, 275 

6.7.3 Sequential Method Using Matrix Inversion Lemma, 
276 

6.7.3. 1 Estimation with Only One Observation at 
Each Time (m= 1), 278 

6.7.3.2 Sequential Analysis of Example 5.2.4, 282 

6.7.4 Sequential MAP Estimation, 284 

6.7.5 Multiresponse Sequential Parameter Estimation, 286 

6.7.6 Ridge Regression Estimation, 287 

6.7.7 Comments and Conclusions on the Sequential Esti- 
mation Method, 288 

6.8 MATRIX FORMULATION FOR CONFIDENCE INTERVALS AND RE- 
GIONS, 289 

6.8.1 Confidence Intervals, 290 



CONTENTS 


6 8 2 Confidence Regions for Known 290 
68 3 Confidence Regions for ij>=» R with SI Known and 
Unknown, 299 

6 9 MATRIX ANALYSIS WITH CORRELATED OBSERVATION ERRORS, 

301 

691 Inlioduclion, 301 

692 Autoregressive Enron (AR). 303 

6 92 1 OLS Estination with AR Errors, 306 
69 22 ML Estimauon with AR Errors, 308 

693 Moving Average Errors (MA), 312 

6 94 Summary of First Order Correlated Cases for the 
Model 1 )=^, 313 

6 9 5 Simultaneous Estimation of p, and Physical 
Parameters for the al Cases, 315 
REFERENCES, 319 

APPENDIX 6a AUTOREORESSIVE MEASUREMENT ERRORS, 320 
APPENDIX 6b matrix INVERSION LEMMA, 326 
PROBLEMS, 327 


Chapter 7 Minlmliation of Sum of Squares Functions for Models 
Nonlinear in Parameters 
7 1 INTRODUCTION, 334 

7 1 1 Tnal and Error Search, 335 
I 2 Exhaustive Search, 336 
7 1 3 Other Methods, 337 
7 2 MATRIX FORM OF TAYLOR SERIES EXPANSION, 338 
7 3 SUM OF SQUARES FUNCTION, 338 
7 4 GAUSS METHOD OF MINIMIZATION, 340 
7 41 Derivation, 340 

74 2 Components of Gauss Lmeanzation Equation, 342 
7 4 3 Comments on Gauss Lmeanzation Equation, 346 
7 44 Linear Dependence of Sensitivity Coefficients, 349 

7 5 EXAMPLES TO ILLUSTRATE GAUSS MINIMIZATION METHOD IN- 
VOLVING ORDINARY DIFFERENTIAL EQUATIONS, 350 
7 5 1 Estimation of a Parameter for a Long Fin, 350 
7 5 2 Example of EstimaUoQ of Parameters in Cooling 
Billet Problem, 357 


\ 


334 


CONTENTS 


xvii 


7.6 MODIFICATIONS OF GAUSS METHOD, 362 

7.6.1 Box-Kanemasu Interpolation Method, 362 

7.6.2 Levenberg Damped Least Squares Method, 368 

7.6.3 Marquardt’s Method, 370 

7.6.4 Comparison of Methods, 371 

7.6.4. 1 Box-Kanemasu Example, 372 

7.6.4.2 Bard Comparisons, 375 

7.6.4.3 Davies and Whitting Comparison, 376 

7.7 MODEL BUILDING AND CONFIDENCE REGIONS, 378 

7.7.1 Approximate Covariance Matrix of Parameters, 378 

7.7.2 Approximate Correlation Matrix, 379 

7.7.3 Approximate Variance of Y, 380 

7.7.4 Approximate Confidence Intervals and Regions, 380 

7.7.5 Model Building Using the F Test, 386 

7.8 SEQUENTIAL ESTIMATION FOR MULTIRESPONSE DATA, 387 

7.8.1 Assumptions, 388 

7.8.2 Direct Method, 389 

7.8.3 Sequential Method Using the Matrix Inversion 
Lemma, 391 

7.8.3. 1 Sequential Method for ni = \, p Arbitrary, 
392 

7.8.4 Correlated Errors with Known Correlation Parame- 
ters, 393 

7.9 EXAMPLES UTILIZING SEQUENTIAL ESTIMATION, 393 

7.9.1 Simple MAP Example Involving Multiresponse Data, 
394 

7.9.2 Cooling Billet Problem, 397 

7.9.2. 1 Other Possible Models, 398 

7.9.3 Semi-Infinite Body Heat Conduction Example, 400 

7.9.4 Analysis of Finite Heat-Conducting Bod/ with Mul- 
tiresponse Experimental Data, 402 

7.9.4. 1 Description of Equipment, 402 

7.9.4.2 Physical Model of Heat-Conducting Body, 
406 

7.9.4.3 Parameter Estimates, 407 

7.10 SENSITIVITY COEFFICIENTS, 410 

7.10.1 Finite Difference Method, 410 

7.10.2 Sensitivity Equation Method, 411 

REFERENCES, 414 
PROBLEMS, 415 



CONTENTS 


Chapter 8 Design of Optimal Experiments 419 

8 1 INTRODUCTION, 419 
8 2 ONE PARAMETER EXAMPLES, 420 

821 Linear Examples for One Paranwter, 420 

8 2 11 Model »}, =» pX, with No Constraints, 420 
8212 Model for Fixed Large n and 

Equally Spaced Measurements, 421 
82 13 Model for Fixed Large n and 

Fixed Maximum Value of |i)]. 426 

822 One-Parameter Nonlinear Cases, Tj = ij(iS,0 428 
8 2 3 Iterative Search Method, 431 

8 3 CRITERIA FOR OPTISUL EXPERIMENTS FOR MULTIPLE PARAME- 
TERS, 432 

8 3 1 General Cntena, 432 

8 3 2 Case of Same Number of Measurements as Parame 
ters (n“p) 434 

8 32 1 Linear Examples for 2, 435 
8 3 2 2 Nonlinear Example for p 2, 438 
8 4 ALGEBRAIC EXAMPLES FOR TWO PARAMETERS AND LARGE n, 440 
84 1 Linear Model 11 “ /I, + /3^ Sin r, 440 
8 42 Exponential Models with One Linear and One Non- 
linear Parameter, 441 

8 5 OPTIMAL PARAMETER ESTIMATION INVOLVING THE PARTIAL DIF- 
FERENTIAL Equation of Heat Conduction 444 
8 5 1 Semi Infinite Body Examples, 445 

8 511 Temperature Boundary Condition (Single 
Parameter) 445 

8^12 Constant Heat Elux Boundary Condition 
(Two Parameters), 448 

8 5 13 Heal Flux Boundary Condition to Cause a 
Step Change m Surface Temperature 450 
8 5 14 Summary of Optimal Designs for Semi In- 
finite Bodies Subjected to Heat Flux 
Boundary Conditions, 451 
8 5 2 Finite Body Examples, 453 

8 5 2 1 Sinusoidal Imtial Temperature m a Plate, 

453 

8522 Constant Heat Flux at x = 0, Insulated at 
x“L,454 


CONTENTS 


xix 


8.5.3 Additional Cases, 457 

8.5.4 Optimal Heat Conduction Experiment, 458 

8.6 NONSTANDARD ASSUMPTIONS, 459 

8.6.1 Nonconstant Variance, 459 

8.6.2 Correlated Errors, 460 

8.7 SEQUENTIAL OPTIMIZATION, 460 

8.8 NOT ALL PARAMETERS OF INTEREST, 461 

8.9 DESIGN CRITERIA FOR MODEL DISCRIMINATION, 464 

8.9.1 Linearization Method, 465 

8.9.2 Information Theory Method, 467 
8.9.2.1 Termination Criteria, 470 

REFERENCES, 474 

APPENDIX 8a optimal EXPERIMENT CRITERIA FOR ALL PARAMETERS 
OF INTEREST, 475 

appendix 8b OPTIMAL EXPERIMENT CRITERIA FOR NOT ALL PARAME- 


TERS OF INTEREST, 477 

problems, 478 

Appendix A Identifiabiiity Condition 481 

Appendix B Estimators and Covariances for Various Estimation 

Methods for the Linear Model rj = Xj8 488 

Appendix C List of Symbols 490 

Appendix D Some Estimation Programs 493 

495 


Index 



CHAPTER 

Introduction to and survey 

OF PARAMETER ESTIMATION 


1.1 INTRODUCTION 

One of the fundamental tasks of engineering and science, and indeed of 
mankind in general, is the extraction of information from data. Parameter 
estimation is a discipline that provides tools for the efficient use of data in 
the estimation of constants appearing in mathematical models and for 
aiding in modeling of phenomena. 

The models may be in the form of algebraic, differential, or integral 
equations and their associated initial and boundary conditions. An esti- 
mated parameter may or may not have a direct physical significance. 

Parameter estimation can also be visualized as a study of inverse 
problems. In the solution of partial differential equations one classically 
seeks a solution in a domain knowing the boundary and initial conditions 
and any constants. In the inverse problem not all these constants would be 
known. Instead discrete measurements of the dependent variable inside the 
domain must be used to estimate values for these constants, also called 
parameters. 

Parameter estimation is needed in the modem world for the solution of 
the many diverse problems related to the space program, investigation of 
the atom, and modeling of the economy. Examples and applications in this 
book, however, are directed to estimation problems occurring in engineer- 
ing and science in which partial differential equations as well as ordinary 
differential and algebraic equations are used to model the phenomena. 


1 



CIUPTER 1 iVTRODUCnON OF PARAMETER ESTIMATION 


Fortunately, simultaneous with the development of increased need of 
parameter estimation, computers have been built that make parameter 
estimation practicable for a great array of appbcations It should be noted 
that both digital computational and data acquisition facilities are practical 
necessities in parameter estimation Both these facilities have been readily 
available only since the late 1950s or early 1960s, whereas estimation was 
first extensively discussed by Legendre m 1806 (I] and Gauss in 1809 (2J 
In Gauss’s classic paper he claimed usage of the method of least squares 
(still used in parameter estimation) as early as 1795 m connection with the 
orbit determination of minor planets For this reason Gauss is recognized 
as being the first to use this important tool of parameter estimation 

The name “parameter estimation” is not universally used Other terms 
are nonlinear least squares, nonlinear estimation, nonlinear regression,* 
and identification, although the latter sometimes is given a quite different 
meaning Estimation is a statistical term and identification is an electneal 
engineenng term 

1.1 1 Parameters, Properties, and Stales 

A mathematical model of a dynamic process usually involves ordinary or 
partial differential equations Sometimes the solution of these equations is 
a relatively simple set of algebraic equations In any case there are 
dependent and independent variables and also certain constants The 
dependent vanables are sometimes called state vanables (or signals) The 
constants may be parameters 

In experiments the states are frequently measured directly, but the 
parameters are not Approximate values of the parameters are inferred 
from measurements of the states Since only approximate parameter values 
are found, the parameters arc said to be estimated This book is pnmanly 
concerned with estimating parameters Other books concentrate on provid 
ing "best” predictions of states based on knowledge of independent vari- 
ables The two problems are quite similar When parameters are estimated, 
state estimates are usually found simultaneously 

A parameter having a physical significance for a solid or fluid might also 
be termed a property Examples of parameters that might also be properties 
are characteristics of matenals such as density, specific heat, viscosity, 
thermal conductivity, electneal conductivity. Young’s modulus emittance, 
and electneal capacitance A pri^ierty Irequenlly involves the concept of 
per unit length, area, or volume Examples of quantities that are parame- 
•fers'oc/inViintipeiTivs oruncsi a given specimen, e’lecinca’i 

•Parameter estimation is not neceaaanJy nonlinear as implied by some of these terms 



1.1 INTRODUCTION 


3 


resistance of a section of wire, and drag experienced by a truck moving at 
a constant velocity. 

The concepts of parameters, properties, and states are illustrated by the 
following examples. 

Example 1.1.1 

Newton’s second law states for a system that F^(t) = where F^(t) is the 

force in the x direction, a^(t) is the acceleration in the x direction, and m is mass. 
Force and acceleration, which can be functions of time t, can be considered to be 
states, whereas mass is a parameter. Force and acceleration are both often easily 
measured; usually mass is easily measured separately also, but there can be cases 
when the mass must be inferred, such as when determining the mass of comets and 
planets. If a body is homogeneous, the mass is the product of its density and 
volume. The density is a property. If the volume were known and measurements of 
F and were available, the parameter to be estimated would be the density, which 
also happens to be a property of the material. 

Example 1.1.2 

Ohm’s Law states that E(t)=Rl{t), where E(t) is voltage, I(t) is current, and R is 
electric resistance. Voltage and current are states, whereas resistance is a parame- 
ter. If the resistance of a wire of known length and diameter is being determined, 
one could instead estimate the electric resistivity of that type of wire, a parameter 
which is also a property. 

Example 1.1,3 

An object is thrown vertically above the earth with an initial velocity of Vq. From 
the solution of the appropriate differential equation, the distance j of the object 
above the earth is described by Vot — gt^/2, where g is the acceleration of 
gravity. Here s would be the state and g the parameter. The independent variable is 
time /. Vq could be a parameter or a state. This illustrates that the parameter and 
state estimation problems sometimes overlap. 

Above, the term parameter was applied to what might be termed 
“physical” parameters in constrast with statistical parameters. Examples of 
statistical parameters are variances and correlation coefficients of the 
measurement errors. Both types of parameters may have to be estimated in 
some problems; in others only the physical parameters need be found. 

1.1.2 Purpose of This Chapter 

The purpose of this chapter is to survey some of the basic problems and 
concepts of parameter estimation covered in this book. By understanding 



aiXPTER I I^mlODUCT10N OF PARAMETER ESTIMATION 


some of the ideas in a simple form, the reader can better comprehend the 
detailed treatment m later chapters 

Much of parameter estimation can be related to five optimization 
problems The first problem is the choice of the best function to extremize 
The most common function chosen for minimizing is the sum of squares of 
deviations This yields the method of least squares discussed in Section 1 3 
The second optimization problem is the minimization of the chosen 
function, which is also discussed m Section I 3 These first two optimiza- 
tion problems are usually what is meant by the parameter estimation 
problem, which is discussed further m Section I 2 3 The third optimization 
problem involves optimal design of expenments to obtain the “best” 
parameter estimates This is discussed m siection 1 2 4 If several competing 
mathematical models are known, but the true model is unknown the 
(fourth) problem is the optimal design of expenments to discriminate 
between the models, see Section 12 5 (By a “known” model w'e mean that 
the mathematical structure of the equation is known even though the 
values of certain parameters may be unknown ) The fifth and most 
difficult problem is the determination of mathematical models when there 
is so bttle information chat a complete finite list of competing models is 
not known (Section 1 1 6) 

In addition to the basic optimization problems in parameter estimation 
there are a number of basic concepts One of these relates to what we call 
sensitivity coefficients A sensitivity coefficient is formed by taking the first 
derivative of a dependent variable (i e . a stale variable) with respect Co a 
parameter This is discussed further in Section 1 4 Sensitivity coefficients 
arc important because they can give information regarding linearity and 
icfenii/iiabifi^ Linearity is concerned wnh the dependent vanable(s) being 
linear or nonlinear m the parameters (Section 1 4) There are cases where 
unique solutions for some parameters do not exist, this relates to identifia- 
bility, which is discussed in Sections I 3 and 1 4 

Another concept, parsimony long a principle in which choices among 
alternative explanations of physical phenomena have been made, has 
recently been emphasized by Box [4] in its application to parameter 
estimatioiv Box asks that a model have a miiumum number of parameters 
consistent with the physical basis if there is one 

1.13 Related Research 

A number of individuals and groups have contributed to the research m 
the past decade Some of the statisticians and engineers who have made 
significant contributions lo parameter estimation are G E P Box N 
Draper, J S Hunter, M J Box, W G Hunter Y Bard J H Seinfeld 



1.2 FUNDAMENTAL PROBLEMS 


5 


and L. Lapidus. Some books on parameter estimation with a statistical 
emphasis are given in references 3 through 6. 

Another group that has made a large contribution to estimation is the 
control and systems group of electrical engineers. Most of their work 
relates to state estimation rather than parameter estimation, which they 
call identification. Some books on state estimation are by Sage and Melsa 
[7], Sage [8], Deutsch [9], and Bryson and Ho [10]. Books by Sage and 
Melsa [11], Graupe [12], and Mendel [13] discuss identification. This group 
usually is concerned with estimating states and parameters in sets of 
ordinary differential equations. 

Another group is composed of econometricians, that is, economists with 
a strong interest in statistics. One reference is Kmenta [14]. Econometri- 
cians have generally concerned themselves with models which can be 
approximated adequately by systems of linear algebraic equations. 

Other valuable works, not identified with any of the above groups, are 
references 15-17. 

1.1.4 Relation to Analytical Design Theory 

Parameter estimation is related to analytical design theory. The concepts 
of choosing the best cost function and minimizing it are common to both. 
Because of the similarity, there may be some technology transfer from 
parameter estimation theory to design theory. 

In addition to the design aspects related to cost functions mentioned 
above, parameter estimation is also concerned with the design of “best” 
experiments. This involves various ideas including criteria, constraints, and 
sensitivity. Furthermore, modern design is utilizing statistics to a greater 
extent than formerly to describe tolerances, life of structures, and so on. 

1.2 FUNDAMENTAL PROBLEMS 

A number of distinctions between various estimation problems should be 
understood. Some of these distinctions can be confusing because similar, 
but not identical problems, are encountered in control engineering. 

1.2.1 Deterministic or Classical Problem 

In the classical problem one mathematically models a system in a certain 
domain and seeks to calculate the dependent variable(s) in the domain for 
a known model and initial and boundary conditions. There is no uncer- 
tainity in any of these. Not only is the structure of the model known (e.g.. 



CHAPTER 1 INTRODUCTION OF PARAMETER ESTIMATION 


q(t) 

Known Input 


Known nft) , 
System j Output state 


Figure 1 1 Oassical problem of known m 
put am) lystem Tbe problem is to calculate 
the output state 


an ordinary differentjal equation of known order) but all the relevant 
parameters or properties are known This is illustrated by Fig 1 1, which 
might be visualucd as the heaung of a btUet with i?(0 being the heat input 
and 7f(f) the temperature of the billet The model might be 

= (I 2 I) 

(12 2 ) 

where (1 2 1) is the differential equation containing the input ^(/) and 
(1 2 2) IS the initial condition All the parameters, B and 9 ( 1 ). are 
known The objective is to calculate the state t}(i) for r > 0 in other 
words, to solve the differential equation 
Of all the problems listed here the classical problem is the one engineers 
are most often trained to solve It is not the subject of this book 

1.2.2 State Estimation Problem 

In the state estimation problem, the state is estimated using measurements 
of the input and the state See Fig 1 2 This problem is similar to the 
classical one m that ij(0 is needed and the model is known There are extra 
complications however in that the observed input contains the noise iv(/) 
and that measurements are only available for the output state ij corrupted 
with the noise e(r) (‘Noise’ means nonsystematic measurement errors) 
Using the preceding example one could still write (1 2 1) but q(i) would 
not be precisely known and neither would P m (I 2 2) In the solution of 
this problem the statistics of wfr) and c(r) are usually assumed to be 
known One seeks a “best” or optimal estimate ij(r) of the true system state 
Tj(t) It IS in connection with this problem that the term “filter” is used 

n-12] 


Measurement I gjl) * wtt) , [ Known I nd) j Measuremtnt I 
f obs^ved Inwit T I Output state" ! | 

\ w(t) 1 e(t) 


Measurement noise 
or disturbances 


iLt) 


Figure 1,2 State estimation problem Tbe problem u toeslimate ii(i) 




1.2 FUNDAMENTAL PROBLEMS 


7 


1.2.3 Parameter Estimation Problem 

In the parameter estimation problem the structure of the differential 
equation is known; measurements of the input q{t) as well as the initial 
condition(s) [B in (1.2.2)] or boundary conditions are available. Some or 
all of the parameters may be unknown. The problem is to obtain the 
“best” or optimal estimate of these parameters using the measured values 
of input and output. 

Because measurements invariably contain errors, solution of parameter 
estimation problems utilize concepts of probability and statistics. The 
requisite probability background is reviewed in Chapter 2 and the bases of 
the statistical methods are given in Chapter 3. The reader who has an 
adequate background in probability and statistics can omit these two 
chapters. 

This estimation problem is illustrated by Fig. 1.3. The “unknown” 
system is modeled by a differential equation containing unknown parame- 
ters. This problem also involves state estimation because r\{t) is unknown 
and is usually estimated at the same time as the parameters. 

The investigation of the parameter estimation problem is a primary 
objective of this text. The emphasis is on parameter estimation techniques 
that are appropriate for analysis of dynamic experiments. Methods useful 
for estimation involving linear and nonlinear partial differential equations 
are given particular attention. 

1.2.4 Optimum Experiment Problem 

The optimum experiment problem can be illustrated using Fig. 1.3. An 
objective is to adjust any inputs such as q{t) or boundary and initial 
conditions so as to minimize the effect of errors on estimated values of the 
parameters. In other words the output T]{t) would be made as “sensitive” 
as possible to the parameters. Adjusting q{t) means the selection of the 
time variation to accomplish the objective. Another objective would be to 
find the best location for sensors and the best duration for taking measure- 



parameters Measurement 

noise 

Figure 13 Parameter estimation problem. The problem is to estimate certain unknown 
parameters in the model of the system. 



CHAPTER 1 INTRODUCTION OF PARAMETER ESTIAUTION 


ments There are certain realistic constraints that must be included when 
seeking these optimums, such as the maximum allowable experiment 
duration and maximum temperature nse Optimum expenments are dis- 
cussed mainly in Chapter 8 


1.2.5 Discrimination Problem 

In the discnmination problem there arc two or more possible candidates 
for the model, one of which is the true model The objective is to design 
expenments that will enable one to decide upon the correct model There 
are some sinulanties with the optimum expenment problem This is also 
discussed m Chapter 8 

1.2.6 Identification Problem 

The identification problem is similar to the parameter estimation problem 
in that there may be unknown parameters in the model The problem is 
much more complex because the structure of the model (e g , differential 
equation) is unknown Developing models is sometimes called model 
building which is discussed in various connections in Chapters 5 to 8 


IJ SIMPLE EXAMPLES 

Typical problems are outlined in this section for estimation problems 
involving algebraic, ordinary differential, and partial differential equations 
These examples are given to introduce the student to a number of parame- 
ter estimation concepts that are amplilied m subsequent chapters 

U.l Linear Algebraic Model 

Suppose that a number of distinct expenments have been performed for a 
given material at different temperatures, T, and that at each T a value of 
the thermal conductivity k has been determined Hence there are a number 
of data sets (Tj Tj), , where K, is a measured value of k and T. 

IS the temperature m the ith experiment "nie data are shown in Fig 1 4 
A model must now be proposed for k versus T If there is any applicable 
physical law relating k and T it should be used For this example none is 
known but the data of Fig \ 4 suggest a Imear relation in T, 


(13 1 ) 



13 SIMPLE EXAMPLES 


9 



Figure 1.4 Simulated data for thermal conductivity vs. temperature. 

where Pq and /S, are unknown parameters. The measurements Y, are 
related to k(T,), abbreviated k„ by 

y, = /c, + e, = ^o + ^i^( + ®(> (1.3.2a) 

where e, is an unknown error. For n measurements there are n equations 
with two parameters and n unknown errors, 


T] — + Cl 

(1.3.2b) 


T„ = /?o+^iT„ + e„ 


If n = 1, both jSq and cannot be estimated. If rt = 2, estimates of ySp and 
can be obtained from (1.3.2b) by neglecting Cj and £3 and solving the 
two equations to obtain 



y,t^-y^t, 

T2-T, 



(1.3.3) 


where the “hat” on ^^d /?j indicates estimate. The k curve passes 



CHAPTER 1 INTRODUCTION OF PARAMETER ESTIAUTION 


through both experimental points Note that T, and Tj cannot be the same 
temperature 

For n > 2, a straight line (i e , a linear curve) cannot simultaneously pass 
through all the points shown m Fig 1 4 One can, however, imagine a 
number of strategies to place the line For example, one could draw a line 
by eye through the data After measunng the intercept and slope, /Sg and 
/3j would be estimated This has a number of advantages including simplic- 
ity and a visual check of the “fif' Moreover, all of us have had experience 
with this method There are some severe shortcomings, however, including 
the lack of reproductivity Different observers draw the line rather diffe- 
rently Equally important is the disadvantage that the method does not 
lend Itself to direct extension to more complex cases 
Other relatively straightforward methods, such as the method of sequen 
tial differences, are discussed by Rabtnowicz {18] 

The well known method of least squares can be utilued to meet the 
objections noted above The sum of squares of the errors. 

s- (134) 


IS mimmized with respect to (he parameters and /?, The sum of squares 
function S must be equal to or greater than zero simply because it is the 
sum of n terms, each of which is a square S can be zero if and only if 
every measurement Y, is on the line 

We can expand the 5 expression given by (1 3 4) to get (omitting as we 
sometimes do, the explicit designation of limits) 

i- Si'.’-zftS I'.r.+zftftS T,+«iis+ii;x (i 3 s) 


showing that m the three dimensional coordinate system 5 is an elliptical 
paraboloid with one minimum See Fig 1 5 Note that (1 3 5) is of second 
degree in and Differentiate S with respect to and iS, to obtain 



13 SIMPLE EXAMPLES 


11 



Both derivatives of (1.3.6) are linear in and /S,. A necessary condition 
for a minimum is that both the derivatives in (1.3.6) be zero. 

Setting these derivatives equal to zero and solving simultaneously yields 


n'ZTy['2T,j 


(1.3.7) 


Introduction of these values into (1.3.1) yields an estimate of k which is 
designated k. A residual is defined by 

Residual= e, = Y-k, (1.3.8) 


which is not identical to the error, e,. 

In this example it is implied that e, is completely unknown. When this is 
true, the least squares procedure just given is recommended. If, however, 
one knows that e, has a variance of (this term is discussed in Chapter 2)’ 
some other estimation procedure might be better. 


CHAPTER I INTRODUCTION Of PARAMETER ESTIMATION 


Example 13.1 

The thermal conductivity of air in units of W/m K versus temperature m kelvin 
has been measured to be the following values 

r(K) I 300 350 400 450 

ACW/m^K) I 00255 0 0309 0 0350 0 0377 
Find estimates of the parameters /^i 1° 3 1) using least squares 

Solution 


2 T, *300 +350+ 400+ 450= 1500 
2 r,^=30C)»+3M)^+40()^+450^“575000 
2 >',-00255 + 00309 + 00350 +00377 - 0 1291 
2 y.r, -0 0255(300)+ +00377(450)-49 43 

Then and from (1 3 ?) are 


0 1291(575000)- 1500(4943) 
4(575000)-(l500)* 


■0 00175 W/m K 


4(4943)- 1500(0 1291) 
4(575000) -(1500)’ 


-00000814 W/m-K’ 


Note that the parameter estimates ^ and have units, each being different The 
residuals are —0 00067. 0 00066. 000069 and —0 00068. which have a zero sum 
The minimum sum of squares which the sum of the square of each of these terms, 
IS 1 823X10'* 


In calculations such as these, at least an electronic calculator usually is 
needed because there ftequeally are smalt differences of large numbers 


13.2 Linear First Order Differential Equation Model 

A case for which a fundamental law can be invoked is that of dropping a 
thin plate initially at temperature Tq into a fluid at From the first law 
of Ihermodynanucs one can denve I19J 




(1 3 9a) 



13 SIMPLE EXAMPLES 


13 


where /? is considered the unknown parameter. Unlike the previous exam- 
ple P has clear physical significance; it is given by the group 


/? = 


h 

pCpL 


where h is the heat transfer coefficient, p density, Cp specific heat, and L 
the half-thickness of the plate. It is not possible to estimate h, p, Cp, and L 
independently when given only measurements of T. One of the concepts in 
estimation, called identifiability, relates to the question of which parameter 
or groups of parameters can be uniquely estimated. See Section 1.5 and 
Appendix A. 

In addition to (1.3.9a) an initial condition is needed to obtain a solution; 
an appropriate one is 


II 

(1.3.9b) 

The solution of (1.3.9) for constant ^ and is 


Tit)=T^ + {To-T^)e~^' 

(1.3.10) 


In the classical problem one stops at this point. In the estimation problem, 
measurements of T are used to estimate jS. 

Temperature data for the cooling of a plate using a single thermocouple 
(a temperature sensor) and uniform time spacing are shown in Fig. 1.6. 
Note that even though the differential equation is linear, T (/,/?) in (1.3.10) 
is a nonlinear function of jS; that is, the .derivative of (1.3.10) with respect 
to /? is a function of yS unlike k given by (1.3.1). This is discussed further in 
Section 1.4. 



Figure 1.6 Simulated temperature measurements of a cooled thin plate. 



14 


CHAPTER t INTRODUCTION OF PARAMETER ESTIMATION 


The simplest method of estimation mvoivcs the use of three temperature 
measurements including 7*o and Observe that at least three measure- 
ments are needed although only one parameter is being estimated This is 
in contrast with the preceding case for which measurements at two 
temperatures were sufficient to estimate two parameters 
For measurements of 7*o and and T, at designated Yq. and 
an estimate of p is 



As cither corresponding to T,-*T^ or /,-»oo, corresponding to 

T,-*Ta:, the error in $ due to some small error in T, becomes very large 
TTius 4 IS more sensitive to errors at some measurement times than others, 
which suggests the subjects of sensifiwO' optimum expenmenUl design 
(see Chapter 8) 

If Tq and T„ are not precisely known they can also be considered 
parameters like /9 They are dissimilar from 0 in that {a) they are particular 
values of the dependent variable (termed the state variable in the systems 
literature) and (i) repeated measurements of these are available m this 
particular example 

One other parameter that could be estimated for this problem is the 
starting time The starting time can be seen to be an unknown if one 
imagines several successive digital temperature measurements taken before 
the plate is dropped into the fluid At the instant the plate contacts the 
fluid the plate's temperature rapidly changes, see Fig 1 6 The time at 
which the plate contacts the fluid might not correspond to the instant at 
which any measurement was taken 

Suppose that the starting time is known to be zero and that all the 
measurements are for t>0 For finding estimates of any combination of 
the paiameieis Tq r„ and ^ one could start with the sum of squares for n 
measurements 

«]" (1312) 

and minimize it with respect to the parameters The derivatives of (1 3 12) 
are linear in terms of To and T„ but nonlinear in terms of p The 
nonlinearity complicates the search Jhra minimum 

One way to minimize S with respect to a nonlinear parameter is simply 
to plot S versus that parameter and graphically find the minimum This is 
a slow procedure, but can give insist 



13 SIMPLE EXAMPLES 


15 


Example 13.2 

Suppose it is known that is equal to 100 and is equal to 300 and that two 
measurements of T are available, y, =220 at 5 sec and ¥ 2 = 170 at 10 sec. Estimate 
using least squares the parameter in (1.3.10). Use a trial and error approach. 


Solution 

A first estimate of fi can be obtained from (1.3.11) using the first observation. We 
obtain 


4 = 


1 , 220 - 100 
5 300- 100 


= 0.1022 


Let us then evaluate the sum of squares function 5 in the neighborhood of that 
value. From (1.3.12) we can write 

5 = (120-200e-^O^+(70-200e-'°^)^ 

which we evaluate at yS = 0.1022 to find 5 = 3.901. Now another value of must be 
tried. Let us try y9 = 0.1; this gives 5=14.493. Because this 5 value is bigger than 
the value for y3 = 0.1022, let us try a larger value than ;3 = 0.1022. At )S = 0.11, 
5 = 32.988. Hence the minimum must be between ^ = 0.1 and 0.11 and is probably 
nearer the first value. Let us try 0.103 which gives 5 = 2.214. Then the minimum 5 
must be between )S = 0.1022 and 0.1 1. A further value of /? = 0.105 yields 5 = 2.853 



Figure 1.7 Sum of squares function for exponential example. 




aiAPTER t I^fIKODUC^ON OF PARAMETER ESTIMATION 


and thus ^ must be between 0 1022 and 0 105, which region could be explored 
further One could continue further m this tnal and error manner to estimate /? 
more accurately This i$ a posnble approach but it is very tedious and time con 
suming particularly if more than one parameter is present More direct methods of 
minimizing S are given in Chapter 7 

It IS instructive to plot the function S for this case See Fig I 7 Note that the 
minimum is near p=0 1 and a local maumum is approached at large Thus in 
addition to 35/3/5 being equal to zero near /S=»0 I it also approaches zero as 
Even more ill behaved 5 functions are possible See Problem 1 S 


Partial Differential Equation Example 

Consider again the same physical problem of a plate dropped suddenly 
into a fluid Instead of negligible internal resistance {Bt = hL/k<0l) 
assume that there is a significant vanation of temperature across the plate 
The describing equations for constant properties and a plate of width 2L 
are [19] 

(0<x<2L) (13 13) 

(1 3 I4a b) 

r(xO)-ro (13 15) 

This is a problem which is linear in the dependent variable, T For the 

estimation problem we can consider T as a function of a number of 
variables, 

T=T(x,l,k,P,c^,h,T^T^,L) (13 16) 

In parameter estimation one must be able to solve the model repeatedly 
for different parameter values For this example an exact solution is 
available as an infinite senes but it may be easier to approximate the 
solution using a finite-diffe/ences representation Such a solution can also 
be modified to treat nonlineanties entenng in either the differential equa- 
tion or the boundary conditions 

Note that in this example several different kinds of measurements are 
reauired temperature^ time., and lengfh Measuiemenls. of tha U5 .i/a!j 1. ouidi.- 
tions and boundary conditions may not be sufficient for parameter estima 
tion, interior measurements may be needed (identifiability) The location 
of sensors and duration of the experiment are studied m connection with 
optimum experiments 



1.4 SENSmVITY COEFFIOENTS 


17 


Another aspect of identifiability is the determination of what parameters 
or groups of parameters can be uniquely estimated. For example, (1.3.13) 
and (1.3.14) can be divided by k to yield the groups pc^/k and h/ k. Since 
no term in these groups appears elsewhere in the problem, one would 
anticipate that these groups could be simultaneously estimated. That this is 
not always true can be proved by noting that this physical problem is 
identical to the one in Section 1.3.2 for which only the ratio of these two 
parameters could be estimated. It happens that Bi = hL/k must be equal 
to approximately one or greater in order to estimate both. There may be 
other conditions that would also preclude estimation for this example. The 
condition for identifiability is discussed in Section 1.5. 

In order to estimate the parameters one can again use the sum of 
squares function. Instead of a single summation over time, one could have 
a double summation over time and sensors located at different positions, 

n m 

^=2 2 [y,(/)-r//)f (1.3.17) 

<=1 7=1 

The subscript j is for position / is for time. There are m discrete locations 
and n different times. Yj{i) designates an observation and 7)(/) a value 
obtained from the model. 


1.4 SENSITIVITY COEFFICIENTS 

In this section a brief introduction to sensitivity coefficients is given. 
Consider the true mathematical model to be given by 'q(x,t,p) where x 
and t are independent variables and j3 is a parameter vector. The first 
derivative of tj with respect to J3, will be called the sensitivity coefficient for 
(i, and designated A,, 



On some occasions the right side of (1.4.1) is multiplied by P, and still 
called simply a sensitivity coefficient. 

Sensitivity coefficients are very important because they indicate the 
magnitude of change of the response -q due to perturbations in the values 
of the parameters. It is for this reason we have given A,., defined by (1.4.1), 
the name “sensitivity coefficient.” They appear in relation to many facets 
of parameter estimation. The reader is urged to pay particular attention to 
them and even to plot them versus their independent variables(s) if their 
shapes are not obvious. One area where the sensitivity coefficients appear 



CHAPTER 1 INTRODUCTION OF PARAMITER ESTIMATION 


IS m the identifiability problem, which is bnefly discussed m Section 1 5 
Another area where the X't appear is the Gauss method of Imeariaing the 
estimation problem when the model is nonlinear in terms of parameters 
(see Section 74) In the opumal design of expenments discussed in 
Chapter 8, the sensitivities also play a key role 
The sensitivity coefficients also appear in a Taylor’s senes for 
i)(^i, ,^^,0 about the neighborhood of the point ,6^) which we 

shall denote b Provided ij has contmuous derivatives near P = b, we can 
write 


9il(b.O 

>l(A. + 


a,(b,/) 

m. 


W-*,) 


3»,(b,0 (A-6,)’ 
' 3A“ 2' 


0 


( A - b , )(*-»!)+ ('■>21 


If the denvatives (ij*l. .;►) for r + s>l are zero, then 

7] IS said to be linear m the parameters For i; a linear function of /8, and 
^ 2 > wc can write 

n{M<‘)-v(Kl>2.t)+x,{0,-t>,)-^X2{02-b2) (143) 

This relation is an equality rather than an approximation if both X, and X 2 
are not functions of the piarameters Hence i) is linear m in parameters tf all 
the sensilicity coefficients are not functions of at^ parameter(s) 

Consider now some simple examples The ^nd /Jj sensitivity 

coefficients for the algebraic model 


b-A-bA'+A'’ (144) 

are, respectively, 

X,-l, X,-,, X,.,^ (145) 

Since each of these is independent of all the parametcers ij given by (1 4 4) 
IS linear in its parameters Estimation involving models linear in the 
parameters is generally easier and more direct than estimation involving 
nonlinear parameters 

Another algebraic model which occurs in many fields is 


rt=fi,exp($2t) 


(14 6 ) 



1.5 IDENTIFIABILITY 


19 


The yS, and P 2 sensitivites of this equation are 

= X2=Pytexp(M (1.4.7) 

which contain the parameters and thus -q given by (1.4.6) is nonlinear in 
terms of its parameters. If, however, the only parameter of interest is ^ 1,17 
is linear in terms of ;S,. 

The evaluation of sensitivity coefficients need not begin with an expres- 
sion of T) but could be initiated with the given differential equation. For 
example, if the derivative of (1.3.9a) (a linear differential equation) is taken 
with respect to 

^ = -{T-T^)-^X, (1.4.8a) 

A'(0) = 0 (1.4.8b) 

Equation 1.4.8a is termed the sensitivity equation for this case and, together 
with (1.4.8b), constitutes a statement of the sensitivity problem. In (1.4.8a) 
it is assumed that T (or tj in the notation of this section) is a known 
function obtained from a previous solution of the original differential 
equation and initial condition. Since ^ appears explicitly in (1.4.8a), the 
sensitivity coefficient A" is a function of 7 S. Consequently the dependent 
variable T is nonlinear in y3 as can be verified from differentiating (1.3.10). 


1.5 IDENTIFIABILITY 

There are some models for which it is not possible to uniquely estimate all 
the parameters from measurements. Rather it is possible to estimate only 
certain functions of them. This is part of the identifiability problem. See 
Appendix A for a derivation of an identifiability criterion. 

In this section several simple cases for which one cannot uniquely 
estimate all the parameters are discussed. Later an identifiability criterion 
utilizing sensitivity coefficients is introduced and related to some of the 
cases previously investigated. 

A model that will not permit estimation of both )3, and 782 is 

tj, = (/3,-)-z1^2)/(0 (1.5.1) 

where ^ is a constant and/(/) is any known function of t. In this case one 
can only estimate ^ = Py + AjS 2 given measurement of 17 , versus In this 
space, 5 does not have a unique minimum, but instead has a 



20 


CHAPTER 1 INTRODUCTION OF PARAMETER ESTT>UT10N 



Figure 1.8 Contoun of mmioium ^ for venous cases »here not all the parameters can be 
uniquely estimated 

minimum along a lm« »hich projects into pl^ne, 

see Fig 1 8 

Consider next the model 

( 15 ^) 

From inspection we see that /3, and can be replaced by the product 
/3- ^2 and that any combination of /J, and 'Qual to ^ ftould yield the 

same value of t| for a given r In terms of the three>dimensional space of 5 
plotted versus /3, and ^ there is a minimum S along a curved line which 
projects into ^ 2 = p / m the ^t.^2 plane, as shown m Fig I 8 
A very similar case to (1 5 2) is 

( 1-5 3 ) 

where again only the ratio ^ is unique and vanous combinations of and 
^2 could be given to provide p=p,//l2=constant In the S, and JSj 
coordinates, the minimum of S occurs along a straight line ^(=;S^2 
projected into the ^1.^2 pl^ne In Fig 1 8 this line passes through the 
origin 

A Afss obvious case is ibr (fle modbf 


(15 4) 



identifiability 


21 


Dividing by /3i yields 

= + = «2 = ^3/5i ' (1-5.5) 

where it is seen that tj, is a function of aj, a^, and t,-, and thus only a, and 
a-i can be simultaneously estimated. 

Another simple case where all three parameters cannot be uniquely 
estimated is for 




(1.5.6a) 



(1.5.6b) 

where only a, and jGj can be found. 




There are other cases where the parameters can not (easily) be uniquely 
estimated if measurements are made only over a certain range of the 
independent variable or at certain values. One example is 


T?/ = ^i + ^2(10+r/) (1.5.7) 

for max|r,| small compared to unity. For such a model it is possible to 
estimate accurately only j3, + lO^Sj if I/,! is small. This model is thus similar 
to (1.5.1) for small |?,|. For sufficiently “large” both and can be 
estimated. Another example is for the model 

T),.= i3it, + )32sin/?3t,. (1.5.8a) 

for small since then rj,. can be approximated by 

pjti = {0i + 02^3% (1 •5-8b) 

Hence for small max| ySj/,!, instead of being able to estimate uniquely all 
three parameters we can estimate only /S, + 1^2 p 3 - 
Many other cases could be cited that demonstrate that only certain 
functions of parameters can be estimated from measurements of tj, versus 
its independent variable(s). Some of these cases may not be at all obvious. 
This is particularly true where there are a number of parameters and the 
model is a differential equation. Rather than depending upon being able to 
manipulate the model so that groups of parameters appear, we would be 
helped by having some criterion that could be applied to the above 
algebraic models and also to models involving differential equations. In 
the latter case we imagine that the solutions of the equations and the 
sensitivity coefficients are available in graphical or tabular form. It turns 
out in the algebraic cases above, as well as for other cases involving 



CHAPTER I HVTRODUCnoN OF PARAMETER ESTIMATION 


differential equations, that the sensitm^ coefficients can provide insight 
into the cases for which parameters can and cannot be estimated 

Parameters can be estimated if the sensitivity coefficients over the range of 
the observations are not linearly dependent This is the criterion that we shall 
use to determine if the parameters can be simultaneously estimated 
without ambiguity See Appendix A for a derivation of this criterion 
Linear dependence occurs when for p parameters the relation 


S’!. 




(15 9) 


IS true for all i observations and for not afl the Cj values equal to zero 
Let us illustrate the above cnlenon for a few examples For (1 5 I) note 
that 


3i), 


»i 


-df/W 


and thus, if and Cj® - I, (I 59) is satisfied Consequently, both jQ] 
and 0■^ cannot be estimated stmuitaneously 
Another example involves (1 5 4) for which 

H ^ 1 dn. _ - 01 5i?. _ -Pit, 

8A"A + ft>,’ ¥i~ 

It 18 not inunediatcly obvious from an inspection of these sensitivity 
relations that there is linear dependence It can be venfied however that if 
C, = /3,, C 2 = ; 8 j, and €3=^, linear dependence exists, m equation form 
we then have 


371 3n dij, 


which form can occur in vanous cases with linear dependence The 
dependent variable rj and the semitnnly coefficients for the model (1 5 4) 
are depicted in Fig 1 9 for Pj= 1 ft is strongly recommended that the 
sensitivity coefficients be plotted and carefully examined to see if linear 
dependence exists or even is approadied The relation given above be 
tween the coefficients can be approximately verifie-d by graphically adding 
the three together to obtain zero at each instant of time Furthermore note 
that the /9, and sensitivities seem to have approximately proportional 
magnitudes for greater than 3 This means that not only is it impossible 


1.6 SUMMARY AND CONCLUSIONS 


23 



Figure 1.9 Dependent variable tj and sensitivity coefficients for t) = jSj/CjSj+ZJjO with 


to estimate and ^3 simultaneously from measurements of tj versus t> 
but it is difficult to estimate only and /Jj using data for if < 1 . 


1.6 SUMMARY AND CONCLUSIONS 


1 . 


2 . 


3 . 


Parameter estimation is a discipline that provides tools for the efficient 
use of data for aiding in mathematically modeling of phenomena and 
the estimation of constants appearing in these models. The problem of 
estimating parameters is that of finding constants appearing in an 
equation describing a system as suggested by Fig. 1.3. 

One way to estimate the parameters for a large variety of models is to 
use least squares which involves minimizing the sum of squares of 
differences between measurements and model values. The minimiza- 
tion problem can be either linear or nonlinear. 

One cannot always independently estimate all the parameters that 
appear in the model. It is clear that not all the parameters may be 
estimated if parameters appear in groups, but in some cases not even 




24 CHAPTER I IPmiODUCTION OF PARAMETER ESTIMATION 

REFERENCES 

\ Legendie A M Nravefici Mtthodet Pour la Deierminauen des OrbiKs des Come<€s 
Pans 1S06 

2 Gauss K F Theory of the Motum of the Heavenly Bod es Moving about the Sun m Conic 
Sections 1809 rapnntcd by Dover PuUicaboos Inc New York 1963 

3 Draper N R and Smith H Apphed Begretsion Anafysis John Wiley & Sons Inc New 
York 1966 

4 Box CEP and Jenkias G M Time Series AnafysiS forecasting and control 
Holden Day Inc San Francisco 1970 

5 Myers R H Response Surface MetkoJcdogy Allyo and Bacon Inc Boston 1971 

6 Bard Y ffonbnear Parameter Esturuuum Academic Press New York 1974 

7 Sage A P and Melsa J L Estimation Theory with Applications to Common canons and 
Control McGraw Hil! Book Co New York 1971 

8 Sage A P Optimum Systems Control prenttce'HaU live Englewood Ctilfs N I 1968 

9 Oeutsch R Estimation TTieoiy Prentice-Hall Inc Englewood Q ffs N J 196$ 

10 Bryson A E Jr andHo Yu-Chi Applied Opnmal Control Opiimiraiion Estimation end 
Control Blaiadell Publishing Co Walibam Mass 1969 

11 Sage A P and Melsa J L System Identification Academic Press New York, 1971 

12 Oraupe D {deniification of Systems Van Nostrand Reifihold Co New York, 1972 

13 Mendel J M Discreie Techniques of Parameter Estimation The Equation Error Formula 
non Martel DeVker Inc New Yotk 1973 

14 Xmenia, J Elements of Econometrics The Macmillan Co New York 1971 

15 Bevington P R Data Reduction and Error Analysis foe the Physical Sciences McOraw 
Hil! Book Company New York 1969 

16 Wolberg J R Prediction Anafysts D Van Nostrand Co Inc Pnnceion N J 1967 

17 Lewis T O and Odell P L CtniiunoR m Linear Models Prectico-Hall Ine En 
glewood Cliffs N J 1970 

18 Rabmowici, E An Introduction lo Expenmenlaiion Addison Wesley Publishing Co 
Reading Mass 1970 

19 Kreith F Principles of Heat Transfer 3rd ed Intexi Educational Publishers New 
York 1973 

PROBLEMS 

1 1 The thermal conductivity k has been found from four independent expert 
men Is at different temperature to be givea by 

r,CC) fc(W/m-C) 

100 90 

200 98 

300 
400 


111 

121 



PROBLEMS 


25 


(fl) Estimate pa and in (1.3.1), using least squares. 

Answer. 78.5, 0.106 

{b) Calculate the residuals. 

Answer. 0.9, - 1.7, 0.7, 0.1 

(c) For /3o = 80, plot S versus yS, in the neighborhood of the minimum. 

1,2 (a) Derive using least squares an estimate of P for the simple model 

■n,=P 

for n measurements. Assume y, = ij, + e„ e, being the measurement error. 
(b) Also derive estimates for Pq and ;S, for the model 

Tl, = ^o+^isint, 

13 Some actual measurements for the specific heat Cp of Armco iron at room 
temperature are, in units of kJ/kg-C, 


l 

1 

2 

3 

4 

5 


0.4287 

0.4363 

0.4451 

0.4409 

0.4442 

/ 

6 

7 

8 

9 

10 

‘^p 

0.4400 

0.4400 

0.4405 

0.4375 

0.4333 

Using 

the model 

of Problem 

1 . 2 a, estimate 

Cp. Plot the 

residuals as a 


function of /. What is the sum of the residuals? 

1.4 (a) For the model 

and the data given below, estimate P by plotting S versus p. Cover the 
range 0 to — 20 . 0 . 

(b) Compare the curve with Fig. 1.7. 

(c) Compare the residuals with the true errors (e,= also given below: 


t 

Data, 

y. 

Errors, 

e, 

0.25 

0.419 

-0.053 

0.5 

0.204 

-0.019 

0.75 

0.159 

0.054 

1.0 

-0.106 

-0.156 

1.25 

0.042 

0.0187 


1.5 Plot S versus P for the model 17 = lOOsin/?^ with P9 in radians and for the 
data 0,=2.79, 7, =34.2; ^^=6.98, 72 = 64.2; and 03 = 8.38, 73 = 86 . Investi- 
gate the range 0<^<1.1 for A 78 increments at least as small as 0.1. (A 



26 


CHAPTER I INTRODUCTION OF PARAMETER ESTIMATION 


pcogrammable calcalaWt would be helpful to get the solution ) What conclu- 
sions can you draw? 

\ 6 How should (1 3 11) be changed to permit estimation of the starting tune, to- 
and Tcp Assume that facasuremcnts are available for t both less than and 
greater than I© Also assume that the plate has been at To for a “long” tune 
before to 

1.7 (u) For the model and data 

f. 

0 200 

1 55 

2 30 

3 20 

calculate the sum of squares S tn the rectangular region 100 <^{<300 
and 20<i82<40 In particular, evaluate S at ^,-100, 200. and 300 
with 0i»2 0, 3 0. and 4 
(b) Is n, linear in and 

(e) Based on the information m (n) estimate 0i and 
(d) Using the search procedure in (a), is ii more or less than twice as much 
work to find two parameters as it is to estimate one'* 

1 S The current > in the circuit of Fig I 10 after the switch S is closed satisfies 
the differentui equauon 


where L is inductance K is resistance, and £ u voltage An iiutial condition 
i = ip at 

Note that a solution of i is in terms of t, L R, E and iq 





PROBLEMS 


27 


(a) What is (are) the dependent variable(s)? 
(h) What is (are) the independent variable(s)? 

(c) What is (are) the state(s)? 

(d) What could be termed parameters? 

(e) What could be termed property? 

(/) The solution of the problem is 

K 


Is i linear m El Rl Ui /q? 

(g) What parameters or groups of parameters can be estimated given 
measurements of /'? 

1.9 For the following expressions of the model rj, indicate for the various j 8 , 
values if tj, is linear or nonlinear in terms of them. 


(«) •>?, = /3| + y?2sin7rt, 
ib) = 


2=1 ^ 

1.10 For the following expressions for the model tj,- derive expressions for the 
sensitivity coefficients. Also plot the sensitivity coefficients and 17 versus ^ 2 ^- 
For parts {b) and (c) graph t]/ P], 9jj/9/?i, and {P 2 / versus ySjt. 
(If values of P-^ and P 2 are needed, let P\ = l and P 2 =^-) 


(a) ri = p^ + P2t 

(b) i] = p^ cosySjt (0 < p2t < 4w) 

(c) T) = 73,(1 - e -ft') (0 < P 2 t < 3) 

1.11 For the model 


17 = 73, (6/ — t^) + y82sint 


where t is in radians, plot the sensitivity coefficients for — 2 < / < 2. Suppose 
Pi and P 2 are both to be uniquely estimated. Over what range (if any) do the 
coefficients appear to be linearly dependent? 

1.12 Find a linear relation between the sensitivity coefficients for fl,, 73 , and P-, 
for the model 


V = Pi 



ft 



CHAPTER 1 INTRODUCTION OF PARAMETER ESTIMATION 



Dgnre 111 >i for Problem I 13 

1 13 Consider Jhe model (see Fig 1 11) 

‘>'o 

v-no ‘<io 

($, IS positive) 

(o) Fmd and graph 

(b) Find and graph dti/9to 

(c) Can fo and no ^ simultaneously estimated using only two measurements 
of n Pi IS known’ 



CHAPTER 


2 

Probability 


2.1 RANDOM HAPPENINGS 

If a room thermostat is set at 2rC, we do not expect the temperature 
throughout the room, or even right at the thermostat, to remain constant. 
Rather, we expect the temperature at any point to change continually and 
continuously while remaining very near 21°C. 

If we run a test of braking distance by repeatedly bringing a car to 55 
mph, then applying the brakes, we expect the distance covered after 
application of the brakes to differ from trial to trial no matter how we try 
to make sure that the road and wind conditions and pressure on the brake 
pedal are the same from trial to trial. We do hope to settle on some typical 
distance and perhaps on some measure of variability. In both cases, 
thermostat and braking distance, there are elements of stability and 
elements of randomness. 

Example 2.1.1 

As a simple example of randomness with an element of stability, let us observe 
successive determinations of percent defective in a sampling inspection of items 
from a production line. (The data to be exhibited were actually generated by 
computer simulation.) Successive items were inspected and declared to be either 
Good or Defective. The first two items were found to be Good, the third Defective, 
and so on. The results of the first 500 determinations are given in Table 2.1 A. To 
make a long series of such determinations easier to contemplate. Table 2. IB gives 
the number of defectives found among the first n items inspected for various n up 

29 



CHAPTER 2 PROBABILITY 


JO 


to 15,000 As we take observations, the fraction defective fluctuates wildly at first, 
but as the number of observations increases the fluctuations dampen Note that, 
although the position in the sequence at which each defective occurs seems quite 
unpredictable, there is an element of stabih^ in the number of defectives in large 
numbers of successive tnals 


Table 2 lA Ordinal Numbers of Defective Items 
Among 500 Items Inspected 
5ih iiih !40th 186th 

187th 380lh 4ll(h 450th 485th 


Table 2>1B Cumulative Numbers of Defective Items Among 15,000 
Items Inspected 


Cumulative 
Number of 
Items 
Inspected 

Cumulative 
Number of 
Defectives 

Fraction 

Defective 

Cumulative 
Number of 
Items 
Inspected 

Cumulative 
Number of 
Defectives 

Fraction 

Defective 

1 

0 

0000 

1,500 

40 

0 027 

2 

0 

0000 

2,000 

52 

0026 

3 

1 

0333 

3,000 

80 

0 027 

4 

1 

0250 

4000 

ns 

0029 

5 

2 

0400 

5,000 

144 

00288 

10 

2 

0200 

10,000 

305 

00305 

20 

2 

0100 

10,500 

324 

00309 

30 

2 

0067 

11,000 

337 

0 0306 

40 

2 

0050 

11,500 

350 

0 0304 

50 

2 

0040 

12.000 

365 

0 03M 

\00 

3 

0030 

12,500 

380 

0 03M 

150 

4 

0027 

13,000 

396 

0 0305 

200 

6 

0030 

i3,S00 

404 

00299 

300 

6 

0020 

14,000 

420 

00300 

400 

7 

0018 

14,500 

442 

0 0305 

500 

10 

o<no 

ts.ooo 

451 

00301 

1000 

30 

0030 





The concept of probability was developed to desenbe a property of an 
experimental situation in which it is impossible to tell what outcome to 
expect for any one tnal but yet, in a long enough senes of tnals, the 
fraction yielding a particular outcome seems fairly stable 




2.1 RANDOM HAPPENINGS 


31 


Example 2.1.2 

In the example of braking distance, the observations differ in an important way 
from those obtained in the case of fraction defective. Since the number of 
defectives in a sample of n observations must be an integer, there are only a finite 
number, n + l, of possible values for fraction defective in the sample. These are 
discrete values and the variate, fraction defective, is called a discrete variate. In the 
braking distance example, the intrinsic variate is continuous, that is, for any two 
possible values of the variate there is another possible value between them. The 
recorded data by contrast are rounded, probably recorded to a fixed number of 
decimal places in whatever units of measurement we have chosen to use. Although 
this digitalization reduces the problem to one involving only a discrete variate, it is 
usually convenient to develop methods of analysis as though we were dealing with 
the intrinsic continuous variate. 

Example 2.1.3 

Another example of stability in randomness is that of a number of temperature 
sensors measuring the temperature at various (fixed) points in a room which is 
equipped with thermostatic control. A continuous plot of the output of three such 
thermocouples might look like the curves in Fig. 2.1. This figure, in one sense, 
constitutes a single observation on the manner in which temperature is controlled 
by the thermostat. A similar record beginning at another point in time would 
constitute another observation. We look for characteristics which are common to 
all records, which describe the manner in which temperature responds to the 
control of the thermostat. 



Figure 2.1 Continuous temperature measurements of air temperature. 


Much of this book is concerned with analyzing data from dynamic 
processes with the aid of some mathematical model of the phenomena. 
These models contain parameters that are to be estimated with the aid of 



32 


CHAPTER! PROBABIUTV 


measurements Digital data acquisition equipment used in dynamic 
measurements, besides recording rounded data, records only at discrete 
points m time The methods of estimation discussed m this book deal with 
problems in which data are recorded at discrete (not necessarily equally 
spaced) points in time 


2.2 EVENTS 

2.2.1 Events. Random Variables and Probabilities 

Three probabilistic aspects of an experiment are (a) the outcome, the 
observed result, (b) the event, the category, of interest to the experimenter, 
into which the observation falls, and (c) the probability associated with the 
event 

Example 2.2.1 

For simplicity let us use for iliusiration (he experiment of tossing a penny three 
times For convenience let us use ihe symbols HHH. HHT HTH, HTT, THH, 
THT TTH TTT to represent the eight possible outcomes, aspect (a) Each three 
letter symbol represents in an obvious way the results, in order, of the three tosses 
Each experiment of cossmg the penny three times results m one and only one of the 
set of outcomes The set of all possible outcomes constitutes the sample space 

As for aspect (b) we may be inieresied in whether the same face shows up on all 
three tosses In this case we are interested in whether the outcome is one of the 
outcomes HHH and TTT In the language of probability we say that the event 
"same face on all three tosses" is the subset {HHH, TTT} of the sample space 
Likewise, if we are interested in whether exactly two tosses yield tails we ate 
interested in the event { HTT THT TTH) An event i$ a subset of the sample space 

If we associate a number {or a vector) with each outcome we form a random 
variable Most of the events in which we shall be interested can be described in 
terms of random variables In the three tosses of a penny, the number of heads is a 
random vanable with value 0 associated with outcome TTT 1 with THT and so 
on The set of outcomes to which we attach a given number form an event Thus 
‘ the number of heads is one" is the event {HTT.THT TTH) 

A summary of terms introduced in this section ts given in Tables 2 2 and 
23 

The thud probabilistic aspect of an expenment is the probability If we 
consider fhe eight simple events m Ihe sample space above as equally 
likely, we attach a probabihly of j to each Sometimes our sense of 
symmetry leads us to such an assignment of probability but often it does 
not If we ask the probability that the next item from the production line 



Table 2.2 Terms Related to Events 


Experiment: something which generates an observation. It may or may 
not require action on the part of the experimenter. 

Outcome (also simple event, sample point): one of the set of possible 
observations which result from an experiment. One and only one 
outcome results from one realization of the expenment. 

Sample space: the set of all possible outcomes which may result from a 
given experiment. 

Event: a subset of the sample space. An event is said to occur if any 
outcome in the event occurs. 

Random variable: a number (or vector) deterrmned by an outcome, 
that IS, a function defined on the points of the sample space. 

Probability: see Table 2 3. 


Table 2.3 Axioms of Probability 

A function P{E) defined on a set of events is called a probability if 
{a) For any event /4, 

0<F(.4)<1 (2.2.1) 

(b) The probability that the outcome of an experiment is some one of 
the outcomes which make up the sample space is 1. The probability 
that the outcome of an expenment is not one of the outcomes which 
make up the sample space is 0. 

(c) If the events A,, A 2 , ■■ are disjoint, that is, no two can occur 


simultaneously, 

PiAiOTA20T . .)=P{Ai)+P{A2)+--- ( 2.2 2 ) 

In particular, if A j and A 2 are disjoint, 

P(A, or A 2 ) = P(A,) + P(A 2 ) (2.2.3) 

Immediate consequences of these axioms are 

(d) P(notA)=l-P(A). (2.2.4) 

(e) P(A oiB)=P(A)+P(B)-P(A and B). (2 2.5) 


33 



34 


CHAPTER 2 PROBABILITY 


which we inspect will be defective, we shall almost certainly not wish to 
associate a probability of j with this event Piobabihsts leave the word 
probability undefined but require that the assignment of probabilities to 
events satisfy certain restrictions (sec Table 2 3) In essence, it is required 
that probabilities act like relative frequencies The probability of a sure 
thing (i e , certainty) is one The probability of the impossible event is zero 
The probability of one or the other of two disjoint events (events that 
cannot happen simultaneously) is the sum of the probabilities of the two 
events 

If events A and B are not disjoint, that is, “A and B” is a possible event. 
It IS clear that the probability of “A and B" is not the sum of the 
probability of A and the probability of B Let us see what can be said 
about the probability of the union of any two events, disjoint or not Now, 
the event A is the union of the disjoint events “A and B” and “A and not 
B” and the event B is the union of the disjoint events “A and B" and “B 
and not/t,” whereas the event "A or B” is the union of the disjoint events 
"A and B," “A and not B," and “B and not A " Thus 

P(A OT B)^P(A and B)+ P(A and not + and not /4) 
P{A)-P{A andB)+P(/l andnoiB) 

P (B ) « /> (.4 and B ) + P (B and not A) 


Hence 


P(/t or B)=P(/l)+P(B)-P(/f andB) (2 2 5) 

2.2,2 Discrete and Continuous Sample Spaces and Associated Probabilities 

The expenment of tossing a coin three times led us to consider a finite 
sample space consisting of eight outcomes In contrast, the experiment of 
tossing a com until a head appears has a denumerabfy infinite number of 
outcomes, H, TH, TTH, TITH, The random vanable “number of 
tosses” defined on these discrete outcomes is a discrete random vanable as 
was “number of heads” m the earber expenment The probability of an. 
event m a discrete sample space is the sum of the probabilities of the 
simple events which constitute the event in question If we attach probabil- 
ities 5 , j, J, etc , respectively, to the sample points H, TH, TTH, , the 



2.2 EVENTS 


35 


probability of the event “the number of tosses is odd” is 

P (odd number of tosses) = P (H) + P (TTH) + P (TTTTH) + • • • 

= - + - + — + • • • 

2 ^ 8 ^ 32 ^ 


= V 1 ^ -2 

1-1/4 3 

For some problems, appropriate sample spaces may be continuous. If the 
outcome is the weight of some object, the number of pounds may be any 
positive number (in some interval). For an example in which the physical 
experiment suggests probabilities, consider picking a number between 0 
and 1 (including 0 but not 1) by spinning a pointer about the center of a 
circle of circumference 1 . The distance along the perimeter in the direction 
of spin to the point designated by the pointer when it stops spinning is a 
random number. In dealing with problems involving continuous mass, the 
mass at a given point is zero; similarly, the probability of choosing any 
particular number between 0 and 1 is zero. If the probability is uniformly 
distributed over the interval [0,1), the probability of drawing a number 
between a and b, Q<a<b<\, is b — a. 

For another example, suppose a double-headed pointer is spun about 
the center of a circle of radius 1. Suppose the number we record is the 
distance along a tangent to the circle from the point of tangency to the 
indicated point, the distance being taken as positive if in one direction 
along the tangent and negative if in the other. The probability that the 
directed distance found will be between a and b, where a<b, might be 
considered to be (tan~’Z7 — tan~*a)/i7; see Fig. 2.2. 



Figure 2.2 Double-headed pointer indicating a point a distance d in the positive direction 
irom the origin of measurement. 


36 


CHAPTER 2 PROBABILITY 


2.2J Assigned Probabilities and Experience with Chance Events 

Actual experiments in com tossing have shown that, in the long run, when 
three coins are tossed repeatedly. HHH, HHT, HTH, HTT, THH, THT, 
TTH, and TTT each occurs about equally often, although not exactly 
equally often The symmetry of the situation lends to make us believe that 
5 IS a charactenstic of the expenment It makes some sense to say that the 
probability of getting three heads the next lime three coins are tossed is | 
In the expenment with stopping distances, the probability of stopping 
within no m is some number that can be approximated by observing, m a 
long senes of similar expeiimenis, the relative frequency of stops in less 
than no m 


2J PROBABILITY DISTRIBUTIONS 

23.1 Univariate Probability Distributions. Distribution Functions 

For any random vanable defined on any sample space, the total probabil- 
ity, 1 , 18 distributed over the possible values of the random variable There 
are several ways in which a probability distribution may be described For a 
discrete random vanable, (he probability of each possible value of the 
random vanable may be given by formula or by table For example, the 
probability of y defectives m a sample of 25 from a given process might be 
taken to be* 

>-0,1. .25 (23 1) 

or we may give the probability for each speafic y as in Table 2 4 (where 
they are rounded to four decimal places) 

For the distribution of a continuous random vanable, we may give the 
probability that the value of the random variable falls in a given interval 
(The probability that it takes on a particular value is zero ) Thus we may 
suggest that the probability that a particular piece of equipment will have a 
lifetime of between a and b hours might in a particular case be represented 
by 




•{?)- 


/i(«-]) (n-r+I) 


f*-I 2 


I. 0’ = l 



23 PROBABILITY DISTRIBUTIONS 


37 


Table 2.4 Probabilities of Various Numbers of Defectives, y, in a 
Sample of 25 Taken from a Production Process 
Producing Defectives Randomly at a Rate of 1% 

[see (2.3.1)] 


.y 

P(y=y) 

0 

0.7778 

1 

0.1964 

2 

0.0238 

3 

0.0018 

4 

0.0001 

5 

0.0000 

25 

0.0000 


The probability distribution of a continuous random variable is seldom 
described in this way since either of two alternative manners of description 
is simpler. Since P(a< Y<d) for a continuous random variable can 
always be written as an integral with limits a and b, we can completely 
describe the distribution by giving the integrand. Thus for the equipment 
life distribution above, the integrand is 


/rW = { 



y>0 

y<0 


(2.3.3) 


Ihxs probability density function, as such an integrand is called, is pictured 
in Fig. 2.3. 

An alternative description of the distribution of a continuous random 
variable is that of giving P{Y < y), called the distribution function* of the 
random variable and symbolized by F’y(y). The subscript is often omitted 
when the random variable is clear. Thus 


Fy(y)=P(y<y)- r fy(u)du 

— on 


(2.3.4) 


»The word “distribution" refers to the manner in which probabilities are associated with 
anous events in the sample space. The phrase “distribution function” is used only in 



38 


CHAPTER! PROBABILITY 



Hours 

Ftg«r« U Ptobabitity drauty (uncuon (2 3 3) 


and !cpt ibe case of equipment life 




0 

l-e 


>•<0 

j '>0 


(2 3 5) 


This distfibuiion function is pictured in Fig 2 4 (When we talk of a 
continuous distribution we mean that the distnbution function is ab 
solutely continuous ) Discrete distributions can also be represented by 
distribution functions Thus m the case of the distribution of number of 



Figure 2 4 Distnbiiiion ftmcuon (2 3 S) 


12 PROBABILITY DISTRIBUTIONS 


39 



0 1 2 3 4 5 


Number of Defectives 

Figure 2.5 Distribution function (2.3.6) of number of defectives (Table 2.4). 

defectives in a sample of 25, 

0, y<0 

[yl / , 

(>') = ] S 0<)^<25 (2.3.6) 

/=o ^ ' 

1, 7 >25 

[y\ means the integral part of y. This distribution function is pictured in 
Fig. 2.5. 

2.3.2 Multivariate Distributions 

The joint distribution of two or more random variables defined on the 
same sample space is a multivariate distribution. An example is the 
distribution connected with simultaneous observations of temperature, 
pressure, wind direction, and wind velocity. 

The distribution function of a bivariate distribution is P (T < x, T < y) and 
is usually symbolized by 

y < .V, T< y) (2.3.7) 

with subscripts often omitted if the random variables involved are clear 
from context. If the two variables are absolutely continuous, there is a joint 
density function 

gz 

Xv, y {x,y) = y {x,y) (2.3.8) 



40 


CHAPTER 2 PROBABILITY 


If we have a bivanate distnbution fuocbon Y(x,y) we can readily get 
the distnbution function of AT or y Since Fx yix, oo) is the probability that 


“X < X, y=»any real number whatever” 

(2 3 9) 

and 

(23 10) 

If we start with a bivanate density function we find 

fxi^) = { Jx r(-*.y)‘6' 

(23 11) 

and 

(23 12) 

Similarly we have 

X, ft, (*l “ X,) 

(2 3 13) 

and 

fx, X, (■'1-^3)= / fx, X^X, <Jf|.A 2 .Xj)lfjr 2 

(2 3 14) 

and so on 



When we have a distnbution involving several variables the distribution 
of a proper subset of them is called a marginal distribution 

Table 2 5 gives a bivanate distnbution with both vanables discrete 
Marginal distnbutions of X and Y are at the bottom and at the nght side 
of the table respectively (Calculations were earned to more decimal 
places and rounded to four Sums rounded to four places are not always 
identical to sums of numbers rounded to four places ) Table 2 6 gives the 
corresponding distribution function Each entry is the sum of the entnes in 
Table 2 5 which occupy the same position in the table plus all those above 
and to the left Thus the entry for ^—2, is 3884 which, except for 
rounding, IS equal to 0156+ 0076+ 1763 + 0761 + 0776 + 0353 (= 3885) 
The marginal distnbution functions are at the bottom and at the nght side 
of the table 

Table 2 1 gives a bivanate density function with both vanables continu 
ous, the corresponding distnbution function, the marginal distnbution 
functions and the marginal density functions The details of denving the 
ofner entnes in "fee taVic from i&ic jonfi iremsty f-emtnon art 'i^t as an 
exercise (The fact that the joint density function changes form along the 
boundary of the region 0<x<y must be taken into account in the 
integrations ) 



Table 2.5 Probabilities that in a Sequence of 20 Trials, x will Result 
in Failure and;' in Partial Success (Probability of Failure, 
.02; Probability of Partial Success, .1) 


y 




X 



0 

1 

2 

3 

4 

Total 

0 

.0776 

.0353 

.0076 

.0010 

.0001 

.1216 

1 

.1763 

.0761 

.0156 

.0020 

.0002 

.2702 

2 

.1903 

.0778 

.0150 

.0018 

.0002 

.2852 

3 

.1298 

.0501 

.0091 

.0010 

.0001 

.1901 

4 

.0627 

.0228 

.0039 

.0004 


.0898 

5 

.0228 

.0078 

.0012 

.0001 


.0319 

6 

.0065 

.0021 

.0003 



.0089 

7 

.0015 

.0004 

.0001 



.0020 

8 

.0003 

.0001 




.0004 

9 






.0001 

Total 

.6676 

.2725 

.0528 

.0065 

.0006 

1.0000 


Table 2.6 Distribution Function of X and Y where X is the Number of 

Failures and Y is the Number of Partial Successes in a Series of 
20 Trials (Probability of Failure, .02; Probability of Partial 
Success, .1) 


^A'.rU>7)= E 


[y] 

E 


20 ! 


,ro A ''•/■'(20- /-;•)! 


(.02)'(.iy(.88> 


20 - i-j 


0 

1 

2 

3 

4 

5 

6 

7 

8 
9 

20 


X 


0 

1 

2 

3 

4 

5 

20 

.0776 

.1128 

.1204 

.1215 

.1216 

.1216 

.1216 

.2538 

.3652 

.3884 

.3914 

.3917 

.3917 ••• 

.3917 

.4441 

.6334 

.6716 

.6765 

.6769 

.6769 ■ • • 

.6769 

.5739 

.8133 

.8606 

.8665 

.8670 

.8670 • ■ • 

.8670 

.6366 

.8987 

.9499 

.9562 

.9568 

.9568 • • • 

.9568 

.6593 

.9293 

.9817 

.9881 

.9887 

.9887 ■ • • 

.9887 

.6658 

.9378 

.9906 

.9970 

.9976 

.9976 • ■ • 

.9976 

.6673 

.9397 

.9925 

.9990 

.9995 

.9996 • • • 

.9996 

.6676 

.9400 

.9929 

.9993 

.9999 

.9999 • • • 

.9999 

.6676 

.9401 

.9929 

.9994 

1.0000 

1.0000 ■ • • 

1.0000 

.6676 

.9401 

.9929 

.9994 

1.0000 

1.0000 • • • 

1.0000 


41 



42 


aiAPTER2 PROBABILITY 


Table 2.7 Example of Bivariate Density Function, 
Cormponding Bivariate Distribution 
Function, Marginal Distribution Functions, 
and Mar^nal Density Functions 


fx y(*0')= 


a 


otherwise 


0<*<y 

0<^<x 

0 otherwise 


fVW-{ 

/,w-[ 

w-( 


0 otherwise 

]e jt>0 

0. otherwise 

,0 


y>0 

otherwise 


/yM- 


y>0 

0. otherwise 


2JJ Sample Paths 

We sometimes have continuous records of some variables such as tempera- 
ture and velocity Suppose, for example, that we are interested in several 
aspects of deceleration of automobiles A driver adjusts hts car to a steady 
speed of 55 mph, then applies his brakes Measuring time from the 
moment the brake pedal is applied, we obtain a continuous record of the 
vehicle veloaty and of the temperature of some point on the brake drum 
In the sample space of this experiment, a sample point is the pair of 
continuous records of velocity and temperature Since the sample point 
can be pictured as a curve in three-dimensional space (velocity and 
temperature against time), the sample point in such a case is referred to as 
a sample path 

A sample path is an infinite dunensional random variable We some- 
times conceive of our experiment as producing a sample path but actually 
record only periodically, say, every nullisecond We are particularly inter- 
ested in random vanablcs which are records of some continuous phenome- 
non taken at discrete points m time The time points may or may not be 
equally spaced 

In dealing with simultaneous records over lime our notation must take 



2.4 CONDITIONAL PROBABILITIES 


43 


into account both the variable recorded and the time which is of interest; 
thus if we read velocity, X, at times t^, and and temperature, Y, at tj, 
tj, and we have a six-dimensional random variable Xit^, X{t^, X(i^), 
Y{Q, Y{t^), Y{Q. 


2.4 CONDITIONAL PROBABILITIES 

2.4.1 Conditional Distributions. Discrete Case 

In considering records of temperature in a room, we may be interested in 
finding answers to conditional questions such as “// the temperature at one 
particular point in the room is 21 °C, what is the probability that, at the 
same time, the temperature at another particular point in the room is less 
than 20° C?” To handle this question we would need to build a model of 
the interrelations of temperature at various points in the room and at 
various times. Instead we illustrate concepts of conditional probability 
with a much simpler example, that of Tables 2.5 and 2.6. Let us ask, if 
there are no failures in the sample, what is the probability of two or more 
partial successes? Recall that our theory of probability is designed to make 
probabilities idealizations of relative frequencies. In this spirit let us 
interpret the probabilities of Table 2.6 as relative frequencies. We call 
attention to particular entries in Table 2.6 which we use in our illustration. 
From the table we read directly P {X = 0,Y < 1} = .2538, P {2(' = 0) = .6676, 
P{Y< 1} = .3917. From these we find P {X = 0,Y >2] = P {X = 0}- P {X 
= 0, y < 1) = .6676 — .2538 =.4138, P {y>2} = .6083. Now if these probabili- 
ties were relative frequencies and if the total number of cases were 10,000, 
we would have 6676 cases in which Ar = 0, of which 4138 were cases in 
which A' = 0 and Y >2. The relative frequency of T > 2 among cases of 
2r = 0 would be 4138/6676 = .6198. This relative frequency does not de- 
pend on the number of repetitions of the experiment and may be calcu- 
lated directly from the probabilities of the event ‘"X = 6 and Y > 2” and the 
event “2^ = 0,” namely, .4138/.6676 = .6198. It seems reasonable to call 
F(A' = 0, y > 2)/F(2r=0) = .6198 the conditional probability of y>2 
given 2^ = 0. We symbolize the conditional probability of the event A given 
the event B by P{A\B) and provisionally define 

P (A and B ) 

^(^15) = — if P{B)¥^0 (2.4.1) 

In our example we found P { y >2|2r=0} = .6198 which may be com- 
pared with P { y > 2) = .6083. P ( y > 2iAr^0) turns out to be .5851. Thus 
we see that the probability that y > 2 depends on what we know about X 



44 


CHAPTER 2 PROBABILITY 


being equal to zero In general, 

if P{A\B}>P{A}.thenf(JlnoiB}<P{A) (24 2) 

If we were to go through the calculations we would find that P {X^OlY 
?2}¥=P{A'=0) and, m general, 

if then (243) 

Thus if the probability of A depends on whether or not B happens, the 
probability of B depends on whether or not A happens, we can simplify 
our description of the situation by saying that A and B are dependent 
If A and B are not dependent they are independent If none of the 
probabilities involved are zero (we have not yet considered what condi- 
tional probability might mean if the denominator of the right member of 
(2 4 1) were zero), the inequalities of the preceding paragraphs imply that if 
/■(/flfi }«/>{/!) then />M|not and 

?(Blnol A) Several other equalities follow irnmediately The develop 
ment of these equalities is left as an enercise A symmetric form and one 
that does not depend on a definition of conditional probability and does 
not require the exclusion of zero probabilities is used as a definition of 
independence A and B are independent if 

P{Azn<iB\~P{A]P{B} (244) 


2A 2 Conditional Distributions Continuous Case 




P{X<x\a<Y<b)^ 


P(X<x a<Y<b) 
P(a< YKb) 


(2 4 5) 


If the distribution function of Y is continuous in the interval (a y] if a <y, 
and i{/(>)=?^0, 

P{X<x,a<Y<y) 

Urn — r — 


exists We call the limit the conditional distribution function for X given 
end symbolize it by Px\y(-^Ip) J’HcispjJaJ s rvie 


B'x\r(x\y) = 


(3/3y)f,y(x.y) 


/r(y) 


(2 4 6) 



2.4 CONDITIONAL PROBABILITIES 


45 


If A" is a continuous random variable and_y is a particular value of Y, 
{‘d/dx)Fx^YiAy) exists whether Y is discrete or continuous. We call it the 
conditional density function and symbolize it by/^|y(x| 7 ). It is 




id/dx){d/dy)F^_yix,y) 

friy) 


fx.riX’y) 

friy) 


(2.4.7) 


Table 2.8 gives some conditional distributions connected with the bi- 
variate random variable of Table 2.7. 


Table 2.8 Some Conditional Distributions Related to the Bivariate 
Distribution of Table 2.7 


( 0, otherwise 


l-e x/3_i^g-y/3^ 0<x<y 

^.r,y(x,y) = - l-e-y/^-^ye-y/^, 0<y<x 

.0, otherwise 

P {X x,u < Y <t)) = isf Y (x, w) — y (x, u) 

= 0<x<u<v 

P {u < Y < v) = F^^y Fx^y (oo,m) 

;c(e-«/3_e-»/3) 


P{X<x\u<Y<v)=- 


^^ly(3c|y)=limP(J^r<^|«<y<;;)=^, 0<x<y, y>0 


0<:XK u<v 




or 


/x,r(x,y) 1 


Note thalXf|.-(jr|j,) = 0 if ;c<0<j. or 0<^<;e and is undefined for^<0. 



46 


CHAPTER 2 PROBABIUTV 


2.43 Bayes’s Theorem 

fn some probfems of parameter estimation we have some information 
about the possible values of the parameters, and we may have some a 
pnon idea as to the probabiJi^ distnbuooo of the possible values of the 
parameter We should be able to combine this information with the 
information provided by our observations in estimating the actual parame- 
ter values Wc assume that our model tells us the probabilities of obtaining 
particular observations when the parameter is known Bayes’s theorem tells 
us how to combine the a priori probabilities of parameter values with the 
probabilities of observations conditional on given parameter values to give 
us a posterior set of probabilities of parameter values The theorem itself is 
more general, relating the probability of each of one set of disjoint events 
conditionaf on each event in a second set to the probability of each event 
of the second set conditional on each event in the first set Before starting 
the theorem we give a simple example to illustrate the theorem in the case 
of discrete vartates 

Example 2.4.1 

We assume four faciones M|./42 A 4 ) each with ihe fued fraction of ihe total 

production of a particular product given m Table 2 9A The product comes in three 
colors, (£(,£2 £]) The fraction of Ihe production of each factory devoted to each 
color IS gwen m Table 2 9B Find the fraction of the production of each color 
which IS produced by each factory 


Table 2 9A Fraction of Total Output Produced by Each Factory 


Factoiy Ai 

Fraction of tola] 0 40 

At 

032 

A, 

020 

-4. 

008 

Table 2.9B 

Fraction of Each Factory’s Output in Each Color 



Factory 


Color 

'*1 

^2 

A, 

A^ 

£, 

030 

030 

060 

0 


030 

035 

040 

0 

£3 

040 

025 

0 

1 00 

Total 

100 

too 

100 

100 


Solution 

The fraction of total production which falls into each of the 12 caiegones formed 
by the lour factories and the three colors is given hi Table 29C The fraction of 



2.4 CONDITIONAL PROBABILITIES 


41 


Table 2.9C Fraction of Total Output in Each Factory-Color Combination 


Color 


Factory 


Totals 

A\ 

^2 

^3 


El 

0.12 

0.16 

0.12 

0 

0.40 

El 

0.12 

0.08 

0.08 

0 

0.28 

El 

0.16 

0.08 

0 

0.08 

0.32 

Totals 

0.40 

0.32 

0.20 

0.08 

1.00 


Table 2.9D Fraction of Output in Each Color Produced by Each Factory 


Factory 


Color 

A\ 

^2 

■^3 

A^ 

Totals 

El 

0.30 

0.40 

0.30 

0 

1.00 

El 

0.43- 

0.29- 

0.29- 

0 

1.00 

El 

0.50 

0.25 

0 

0.25 

1.00 


total output in each color is found in the right margin of Table 2.9C. From the 
entries in Table 2.9C we can calculate the fraction of the production of each color 
which is produced by each factory. These fractions are given in Table 2.9D. A 
formula for obtaining the entries of Table 2.9D from those of Tables 2.9A and 2.9B 
is the content of Bayes’s theorem. 

We proceed in two steps to the statement of Bayes’s theorem. First note 
that (2.4.1) may be rewritten as 


P (A, and £J = P (A,)P {E^ [/!,) (2.4.8) 

and also 

P (A,, and £,) = P (£, )P ) ( 2 . 4 . 9 ) 

Thus if P(£/,)^0, 




P(A,)P(£,\A,) 


(2.4.10) 


This is in essence the content of Bayes’s theorem. To calculate P {E^) 
note that if we have a collection of disjoint events the events 

Ef^ and and are disjoint and 


P{£k)-£{£k and^,)-bP(£’^. and^^j)-!- • • • +P{£^ and/l„) 


£{A{)P ^) + P (A2)P (EklA^) +■■■ + P (A„)P (£*1 A„) 

(2.4.11) 




48 


chapter: PBOBABtUTV 


and hence 




i P(A,)F(E,IA,) 


(2 4 12a) 


This IS Bayes’s theorem in detail The computations required by this 
formula are those displayed m Tables 2 9A-D Values of P(A,) are given 
in Table 2 9A. of P(.£M.)y Table29B. of P(A,)P{E^lA^) and /’(£'*) = 
S,P(/l,)P(£*ld,) m Table 29C. and of P{A.\E) m Table 29D 
For a continuous random vanable we give the analogue of (2 4 12a) in a 
form we shall use later 






g<g)/«, w(^i' 

8i«)fx, ’X„\u)tiu 


(24 I2b) 


2^ FUNCrrONS OF RANDOM VARIABLES 

A function of a random variable is another random vanable on the same 
sample space It is sometimes convenient, however, to make use of rela 
tions between the two distributions In some cases we find it convenient to 
find explicit descriptions of the new random variable and m some cases 
we can find those properties of the new random variable which interest us 
m terms of the distnbution of the old random vanable First let us look at 
some methods of desenbing the distribution of a random variable which is 
a function of a random vanable whose distribution is known 
If V IS a discrete random vanable and if F = g(ir) the probability Py^y) 
is the sum of the probabilities P;<(v) over those jc’s for which y = g(Ar) 

Example 2 5.1 

Given the distribution of the random variable X delmed by the first two columns 
of Table 2 lOA and given that 

r=-}x^(x^-4) 

find the distribution of Y 
Solution 

For each x the value ofy = |x^ar— 4) is found and the probabilities of ihe values 
of X which lead to each value ofy are added to gel the probabilities of each value 
ofy These probabilities are given in Tabic 2 lOB 



2.5 FUNCTIONS OF RANDOM VARIABLES 


49 


Table 2.10A Probability Function Table 2.10B Probability Function 
for X and Values of Y for Y 


X 

Pxix) 

y 

y 

Priy) 

-3 

.2 

15 

-1 

.2 

-2 

.3 

0 

0 

.6 

-1 

.1 

-1 

15 

.2 

0 

.2 

0 



1 

.1 

-1 



2 

.1 

0 




To get the distribution function of Y we need to find the probability 
over all the values of X such that g(x) < y. In some cases the distribution 
function of Y is simply found in terms of the distribution function of X. 

If Y=g(X) and g(-) is a continuous, strictly increasing function, that is, 
X 2 >x^ implies ^ 2 >71 > we have an inverse function X = g~\Y) which is 
also monotonic strictly increasing and 

Fj.(y) = P(y<y) = P[g(A)<y] = />[^<g-'(;;)] 

= = wherex = g-’(y) (2.5.1) 

If V is a continuous random variable, T is a continuous random variable 
and 

(2.5.2) 

again with x = g~ '(y). 


Example 2.5.2 
Let 


or equivalently 


fx(x) = 


Fxix) = 


('■ 

0<x<| 

lo. 

otherwise 

0, 

X < 0 

~X 

0<x<| 

1, 

l<x 



50 


CHAPTER J PROBABILfTY 


Find the density Cuncuon of T»*2Ar directly from the density function of X and its 
distribution function from the dtsutbutioa function of X 

Solution 





0<>-<3 

otherwise 




0 . 


y<Q 

0<y<3 

3<y 


If J** g(A') and g is a tnonolonic strictly decreasing function, that is, 
Xj> X) implies we again have an inverse function and 

fr 3') - g(^ )< r] - (3-) 1 

If X IS differentiable we have 

AM- -A[e~'(y)]^ 

If y»g(A') IS monoionic but not stnctiy monotomc, Y may have some 
concentration of probability on certain values 
We give two examples in which the function g(x) is not monotonic 

Example 2.5,3 

Given the distnbuiion funcwon of X, find the dismbution function and the density 
funciion of 

(25 3 ) 

Solution 


y>0 (2S4) 


(2 5 5 ) 



2.6 EXPECTATIONS 


51 


Example 2.5.4 

Given the density function of X 


fx M — 


1 



(x-iif 

2 


find the density function of Y=X^. 


Solution 


Using (2.5.5) we find 




-expf - + iu-^) lcosh( ^L'/y ) 


/AWOLtV OF KNQINEBKINQ 

mb bfflVfer^.SITY OF .JODHPOfJ 

A.CC. K'-i 



Consider a sequence of observations on a se^U6ttCe"<5f*'^^t!tiea^ty*di&tiabwiT' 


2.6 EXPECTATIONS 
2.6.1 Expected Value 


uted random variables. The average of the observed values in the sequence 
is the sum of those values divided by the number of observations. If we 
have a sufficiently long series of observations and a small enough number 
of different values of the random variable, it is easier to note the number 
of times each value appears and calculate the arithmetic average of the 
observed values. This can be done by calculating the sum of the products 
of the different values and their relative frequencies. The arithmetic 
average of the observed values is thus the weighted average of the possible 
values with weights equal to the observed relative frequencies of the 
possible values. If we substitute probabilities for relative frequencies, we 
get an idealization of arithmetic average, a weighted average of possible 
values with weights equal to their probabilities. This weighted average, 
which is not an arithmetic average, is called the expected value of the 
random variable. There are two common notations used for expected value 
of X, E(X) and (The /x, the Greek form of M, is an abbreviation for 
mean.) If there are n possible values for X: x,,X 2 ,...,x„, the expected value 
for this discrete random variable is 


£(X)=Sx,i>(X=x,) (2.6.1) 

Example 2.6.1 

Find the expected number of defectives in a sample of 25 for a production process 
producing defectives randomly at a rate of 1%. 



52 


CHAPTER 2 PROBABtUTY 


Solution 

Let us use Table 2 4 and (2 6 I) to find 


= 0{ 7778)+l(1964)+2( 0238)+3( 001S)+4{ 0001)+ 

=02498 

(Unrounded probabilities would result in an answer £(A') = 025) Over a long 
penod of inspecting samples of soe 25 from a process producing 1% defectives, the 
average iminber of defectives per sample js expected to be near 025 V/e cannot 
obtain this value in a single expenmeni since the number of defectives must be an 
integer Even the average over a long experiment will almost certainly differ from 
the expected value although the difference may be veiy small 

If Af IS a con/inuous random variable, tve can consider the number wc 
would get by dividing the range of values of the random variable into a 
number of di^otnt intervals, calculating the probability of the random 
variable falling into each interval, picking one value of the random 
variable m each interval, and summing the products of each of these values 
with the probability of its interval If we consider a sequence of such sums 
of products for sets of intervals for which the maximum length of any 
interval decreases to zero, (he limit of the sums is ihe Riemann integral of 
the density function, 

■</(*)<*< (2 62) 


Example 2.6.2 

Find the expected value of the random variable X with the density function 


Solution 
Using (2 6 2} 


/w- 


x>0 

otherwise 




'xe *<& = ( — (x + l)e 


Not all probability distnbuUons have expected values In fact, if the 
angle tan of Fig 2 2 is umfonnly distributed between — w/2 and rr/2, 
the random variable d does not possess an expected value However, 
practical people adopt scales of measurement and models of physical 
situations such that it is hi^ily unlike^ that nonexistence of an expected 
value will cause trouble in a practical problem 



2.6 EXPECTATIONS 


53 


We can find the expected value of a function of a random variable, itself a 
random variable, by deriving the distribution of the values of the function 
and using the definition of expected values; or by associating the value of 
the function with each value of the original random variable, we may make 
use of the distribution of the original random variable. If X is a discrete 
random variable and Y=g(X), 

£(¥)== 2 yP{y=y)^ 2 yP[s{X)=y] 

all^ allj> 

= 2 3^ 2 P{X=x)= 2 g{x)P{X = x) (2.6.3a) 

all y all x all x 

for which 

g(x)=y 

or 

£[g(^)]= 2 g(x)P(2r = x) (2.6.3b) 

all X 


Example 2.6.3 

Using the distributions of A' and of Y of Table 2.10, find E{Y) by both (2.6.1) and 
(2.6.3). 

Solution 

By (2.6.1), using Table 2.10B, 

£■ ( T ) = - 1 (.2) + 0(.6) + 1 5(.2) = -.2 + 3 = 2.8 
By (2.6.3), using Table 2.10A, 

£ ( y ) = 1 5(.2) + 0(.3) - 1 (. 1) + 0(.2) - 1 (. 1 ) + 0(. 1 ) 

= 3-. 1-. 1=2.8 

Similar to (2.6.3) we have, for continuous random variables, if Y=g{X), 


yfY(y)<fy= g(x)fx(x)dx 

•'-CC •' — 00 

Example 2.6.4 

Find E{X^) for the random variable with density function 


J^{x)=h ^> 0 . 

1 0, otherwise 


E(X^)= f x^e ^dx = 2 
Jo 


(2.6.4) 


Solution 



54 


CHAPTERS PROBABILITY 


In the next few paragraphs we present some properties of expected value 
using continuous random variables as illustrations The corresponding 
forms for discrete random vanables may be supplied by the reader 
The expected value of a function of X and Y can be found without first 
finding explicitly the distnbution of the function, 

= y (x.y)dK^ (265) 

If, for example, g{X, X, 



(2 6 6) 

The functional E( } has the extremely important property of linearity, 
that IS 

£(flA:4 6K)-o£(Ar)+h£(r) 

(267) 

To see this consider 


E{aX+bY)-J ^(ux + by)/, y(x,y)rf>:<^ 


r(^-y)4’dx + bj’^ 


-ur(r)+6£‘(y) 


This formula, (2 6 7), extends to linear combinations of any finite 
number of random vanables In words the expected value of a linear 
combination of random variables defined on the same sample space is that 
same linear combination of expected values that is 

cj 2 o.£«) 

(2 6 8) 

A particularly important random variable is 


- 2 AT = 2 

”/-i —i" 

(2 6 9) 

By (26 8) 


E(X)- i 

(2 6 10) 



2.6 EXPECTATIONS 


55 


If £(;^,) = /xfor/=l,...,n 

£(X)= 2 = (2-6-1I) 

(=1 ” 

If X and Y are independent, possible values of A" being 
possible values of 

m n 

E{XY)= 2 2 A,>’,P(Ar=x„r=yJ 

,=1 7=1 

m n 

= 22 x,y/(X=x.)i’(Ar=>-) 

<=1 7=1 

^ ( 2 . 6 . 12 ) 

(Be warned that it is possible for £'(A'y) to be equal to E{X)E{Y) if X 
and y are not independent.) 

The properties of expected values which we have demonstrated are 
summarized in Table 2.1 1. 


Table 2.11 Properties of Expected Value 



2 xP{X = x) 

all X 

f ” xfx{x)dx 
— 00 


for discrete random variable X 
for continuous random variable X 


E(g(X)) = 


2 g(x)PiX=x) 

all X 



gix)fxix)dx 



for discrete r.v. 
for continuous r.v. 

2 a,E(X.) 

1 = 1 


In particular, 

E{aX + bY) = aE{X) + bE{Y) 

If £{A',)= IX for all i, E(X)= (i. 

If X and Y are independent, E{XY) = E{X)E{Y) 



CHAPTER 2 PROBABILITY 


5<i 

2j6.2 Variance, Covariance, and Correlation 
The iMnance of a random variable X is defined by 

(2 6 13) 

The nonnegative square root of y(X} is called the standard deviation and 
IS symbolized ^he random variable involved is obvious from context, 
the subscript is often supressed 
Expansion of the square m the definition of ViX) gives 

V{X)-SiX~pf-E(X^-2Xii + ii') 

-£(J’3)-2,.£'(Ar)+p’-£(Jr‘)-f3 

or m summary 

(26 14 ) 

Note that y(X)>0 and is zero if and only if V is a constant 
Example 2 65 

Using the distnbuiion of Table 24 findflY^and PfY) 

Solution 

£(Ar^)»0(7778)+l(l9641+2*( 0238)+3^(OOI8)+42( OOOU«=0 3094 
EUl-Of 7778)+ 1( i964)+2(0238)+3(00l8)+4( 0000 - 02498 

By (26 14), P(X)= 3094-(^498)*= 2470 Of the probabilities in the uble had 
been given exactly rather than rounded lo four places we would have found 
£(Y’)=031 £(Y)-025.and VIJY)=02475) 

A dimensionless quantity known as the coejficient of variation is defined 
to be 

coefficient of variation = — 

Ma- 

It IS a measure of variation measured in terms of the size of the expected 
value 







2.6 EXPECTATIONS 


57 


Relating two random variables defined on the same sample space we 


have the covariance 


cov{X, Y)^E[{X-ti^){Y-iXy)] 

(2.6.15) 

or equivalently, 


cow(X,Y) = E{XY)-[iyliy 

(2.6.16) 

The correlation coefficient is defined by 


cov(V,y) 

(2.6.17) 

OyOy 


Note that cov(A',A")= V {X). 

Variance is not a linear functional. In fact, 

V{aX) = E[aX~E{aX)'f=E[aX-aE{X)f = E[a[X-E{X)]f 

= E[a^[X-E{X)]^]=a'^E[X-E{X)'f = a'^V{X) (2.6.18) 


Table 2.12 Properties of Variance, Covariance, Correlation Coefficient 

V(X)=E{[X-E(X)f} = 4 

(2.6.13) 

V{X)=E{X^)-[E{X)f 

(2.6.14) 

cov(X, r)^E {[V- Y- E{ T)]) 

(2.6.15) 

cov(;^r, Y) = EiXY)- [£(X)1[£( Y)] 

(2.6.16) 

V( 2 a,x,)= 2 afV(V,) + 2 21 2 a,ajCov{X„Xj) 

, — 1 1 ... 

(2.6.20) 


1=1 ;=1 /=] y = /+I 


= 2 S a,ajCov{X,Xj) 

,=l j=\ 


Specifically, 


V{aX) = a^V(r) 

(2.6.18) 

V(aX+k) = a^ViX) 

(2.6.21) 

V{aX+bY)=a'^V(X)+ PV{ Y) + 2ab cov(A', Y) 

If {V,} are independent with K(A',)=a^ for / = 

(2.6.22) 

V(X)=~ 

(2.6.19) 

co\{X,Y) 


Px. 1 ' „ „ 

a^Oy 

(2.6.17) 



aiAPTER3 PROBABILITY 


S9 


Other relations are found ra Table 2 !2 We call attention particularly to 
one to which repeated reference wiH be made If (A",} are independent 
with identical variances then 



(2 6 19) 


Example 2 6 6 

For the density function of Tables 2 7 and 2 8, 


fx y 


0<x<y 


otherMse 

find coYfA F) and the correlation coefficient for X and Y 


Solution 


E(XY)»^ f’'xy* ’/^dxdyTl 

» Jf, Jtt 

cov(A r)-27-(3)(6)-9 
£(A*)-18 £(P')»54 

oJ = 18*-3*-9 e^-«34*6*»-18 


The vanance of a function (other than linear) of two or more indepen 
dent random vanables is often difficult to determine in terms of the 
vanances of the arguments of the function We give here one example of 
one method of estimating such a vanance, a method which has been in use 
for decades and which has been well described by Khne and McChntock 
(2) If a random variable Z=2f>, the product of two independent random 
vanables, we might write X &sn^+e^, X as/^ + c^, and Z as + We see 
that 


H, + e, = ( + ej( >1.^ + ^) 




2.6 EXPECTATIONS 


59 


Since X and Y are independent, = Subtracting from both sides 
gives 


6z “■ P'y^x ^x^y 


If e and are guaranteed to be very small in comparison with and fjy, 
we have approximately 


l^y^x"^ l^x^z 



All this suggests that since, if z=/(A', Y), 

9/ 9/ 

dz- —dx+ -^dy, 
dx 9/ 

we might consider as an approximation to 




9/(^,7) 

dx 


y=i^x 


¥(x,y) 

dy 


y=^l^y 


Kline and McClintock give several examples of the use of this method for 
two or more variates. 


2.6.3 Stochastic Processes. Autocovariance, Cross-covariance 

In dealing with the continuous record of temperature X (t) as a function of 
time we may be interested in the relation between X (/) at time /, and at 
time for individual records. The covariance between the joint random 
variables A'(t,) and X(t 2 ) representing points on the same sample path at 
different times, is called the aiitocovariance. 

When we deal with two continuous stochastic processes, the covariance 
between one of them for one point in time, X (/,), and the other at a point 
in time possibly different, Y is called a cross-covariance. 



CHAPTER 2 PROBABILJTY 


£xaint^e 2.6.7 

Cotisider a simple example which illusiratcs the concept of autocovariance m 
discrete lime Suppose the \al«t «i( a random process at time j is 09 of its value at 
time t ~ I plus the value of a random vanable t, 

( 2623 ) 

where all /mile sets of are independent Suppose that A,,*/) /’(«,= — 1)*4, 
/>(<,=0)= 2, ?(«,=’ l)= 4 for all i Note diat 

£(<,)= - 1 { 4)+0( 2)+ U 4)»0 for f = 1.2. 
l/(e,)»l(4>+0(2)+K4)=0S forf=1.2. 

Three realizations of this process to / *25 generated by computer are shown in Fig 
26 Findf/A",). cov(X,.X,). 



Figtiie 1 6 Three realianons of the process of Example 2 67 


Solution 

.V,*c, 

A'2=09ti+€i 

A-, = 09 %, + 09 fi+<. 


X,= 09 '''*,+ 0 y-%j+ 


2.6 EXPECTATIONS 


61 


£:(A',) = 0.9'-'£(e,) + 0.9'-2£:(e2)+-- - +£■(€, ) = 0 (2.6.24) 

V{X,) = E{0.9^-\\+Q.9^‘-\l+ ■ ■ ■ +e2 + terms each 
involving a product of two different e,’s) 

= E(Q.9^‘-\] + Q.9^'-\l+ ■ • • + f.}) 

= (1+0.9^+ • • • +0.9^'-^) K(e,)= -y=^-^^(0.8) 

= ^0-0.81') 

cov(A’„A'^|/<^) = £(0.9^'“^ef + 0.9^'“'’e2+ ‘ ' ‘ +£,^+terms each 

(2.6.25) 

involving a product of two different e,’s) 

= f"(^,)= 1^(1 -0.81') 

cov(^„Z,)= ^(1 -0.8r"><^-')) (2.6.26) 

(^X ,X "" ~ 

V(1 -0.81"”"<"-'>)(1 -0.81"'“<^-'>) 

y ] _ Q g|niaxO,() (2.6.27) 


2.6.4 Stationarity 

Many interesting stochastic processes are such that the distribution of A'(/) 
does not depend on t and the joint distribution of X(t^) and X(t 2 ) for 
f 2 >fi depends only on the difference We call such processes 

stationary. For instance, in a room with thermostatic control, the probabil- 
ity that the temperature will be between 21 and 22°C at any particular 
time m the future will be the same as for any other future instant. The 
probability that the temperature will be between 21 and 22°C at some 
particular time and between 22 and 24°C half an hour later will be the 
same as for any other two times half an hour apart. We call a process on 



62 


CHAPTER 2 PROBABILITY 


0<t<eiO a slationary process if the distnbiUion of + A'(f 2 + 

h), + IS the same as the distnbutioaof 

all n and all f,>0, I* 1,2, ,/>andA>0 

Processes that are not stationary may possess some of the aspects of 
stationanty If E{X(t)]~ E\X{t+h)] for all A>0, r>0 and Oa-^-Ui./j + A) 
IS a function of h only, the process is said to be wide sense stationary Since 
a process which is stationary in the wide sense is not necessarily stationary, 
the stationanty of a process is sometimes eniphasi 2 ed by calling a 
stationary process strictly stationary 

For stationary processes, notation can be simplified unambiguously 

(2628) 

cov[;t(r,),X(r, + A)] = C^(h) (2 629) 

cov[X, + h)] ™ (A) (2 6 30) 

2 7 LAW OF LARGE NUMBERS CENTRAL LIMITTIJEOREM 
2.7.1 Chebyshev's Inequality 

We may ask why we think estimates based on observations should be any 
more likely to be near the quantity being estimated than pure guesses 
Chebyshev's inequality tells us why we mi^l hope In this section we talk 
about the Chebyshev inequality lUelf In the next section we show how it 
applies to averages 

In essence, Chebyshev’s inequality says that if we use the standard 
deviation of a random variable as a unii of measurement the probability 
of being far from the expected value is small To be precise for any 
random vanable X which possesses a standard deviation a the probability 
of X being at least ko from the expected value ji cannot be more than 
1 / The Chebyshev inequality is expressed in symbols as 

(27 1) 

Before giving a proof, lei us give the Chebyshev inequality m three other 
forms for ease tn reference 

(272) 

k 

F((Ar-p|>e)<^ (2 73) 


(2 7 4) 



2.7 LAW OF LARGE NUMBERS. CENTRAL LIMIT THEOREM 63 

We give a proof for the case of a discrete random variable. For a 
continuous random variable replace sums with integrals. 

By definition 

ct 2 = 2 (x- (if fix) 

all X 

We may divide the possible x’s into two sets, set A in which |x-/i| > ka 
and set B in which | x — /x| < ka. We have 

- pfp (^) + P (^) 

Now 

p{x)>Q 

and 

- [if p (x) > k^a^'L^p (x) = k^a^P (\X-\L\>ka) 

Thus 

a^ > kVP (\X — iJ.\> ka) 

which, when both sides are divided by kV, is the Chebyshev inequality. 
Notice that no particular probability distribution is assumed. 

2.7.2 Weak Law of Large Numbers 

We have based our axioms of probability on the way relative frequencies 
work, expecting near agreement between probabilities and relative 
frequencies if the number of trials is large enough. We now look at a 
theorem in probability which sounds as though it ensures such agreement. 
Actually, we are only showing some sort of consistency in the theory. 

Consider an experiment, an event A with probability 9, and a random 
variable X which is 1 if A happens and 0 otherwise. Independent repeti- 
tions of the experiment constitute a super experiment which yields a 
sequence of independent random variables. Let Y be the average of the 
values of X over n independent repetitions of the experiment; that is, Y is 
the fraction of the experiments in which A happens. Since Y is the average 
of the A’s, E (Y) = E {X) = 9 = P (A). Since X has finite variance, a^, say, 
we have by (2.6.19), V{Y) = a^/n. Using this variance in (2.7.4) we get 

P(|y-01<e)>l--4 

m 


(2.7.5) 



64 


CHAPTER! PROBABILITY 


However small «, we can find an n large enough that P ([ K- 0 1 < <) is as 
close as we please to 1 Thus 

\im^P(\Y~ P{A)\<t)=\ (2 7 6 ) 

More generally, using the same type of argument, we have the weak law 
of large numbers, that is if {X,) is a set of independent identically 
distributed random variables with expected value p and vanance 



(27 7) 


(2 7 8) 


2 12 Central Limit Theorem 

We state the central limit theorem without proof, since a proof seems to be 
of no aid m understanding the theorem 
If are independent identically distributed random vari- 

ables, each with expected value p and vanance o\ then 


y- 


A’l + ATj-h +X, — nit 
uVn 


(2 7 9) 


has approximately the standard normal distnbution. with density function 


/{>■)= 


V2i 



(2 7 10 ) 


The approximation is as accurate as may be desired if n is sufficiently 
large In this sense 


X= 


X, + X2+ +x„ 


( 2711 ) 


may be said to be approximately distributed in accordance with the 
density function 




n{x-p.) 


2o^ 


(2 7 12 ) 


2.8 EXAMPLES OF DISTRIBUTIONS 


65 


Of course, the distribution of Y will be exactly normal if the X’s are 
normally distributed. 


2.8 EXAMPLES OF DISTRIBUTIONS 
2.8.1 Bernoulli Distributions 

The distribution which consists of 0 with probability 1 — 0 and 1 with 
probability 0 is known as the Bernoulli distribution. 

A Bernoulli random variable is one which has (1) two possible values, 0 
and 1, and for which (2) the probability that it takes on the value 1 is 0. 

X P{X = x) 

0 \^0 

1 0 

or 

P{X = x) = 0'^{\-0)'~\ A = 0,1 (2.8.1) 

The expected value and variance of a Bernoulli random variable are 

^ = 0 , 0^=0 ( 1 - 0 ) ( 2 . 8 . 2 ) 

The Bernoulli distribution was used implicitly earlier in connection with 
the weak law of large numbers. 

2.8.2 Binomial Distributions 

The sum of a fixed number of independent Bernoulli random variables is 
said to have a binomial distribution.* Since a binomial random variable is 
the sum of independent identically distributed Bernoulli random variables, 
the expected value of a binomial random variable is n0 and its variance is 
n0{\-0). The individual binomial probabilities are somewhat more dif- 
ficult to compute. They turn out to be 


P{X=x) = 


nl 


xl{n-xy. 


0^{l-0y 


(2.8.3) 


‘The probabilities of various values for the random variable are the terms of the binomial 
expansion of [(1 - 0)+ ep. 



66 


CHAPTER 2 PROBABILITY 


A binomial random variable is 


1. rhesumof 

2. a fixed number n of 

3. independent 

4. Bernoulli random variables which have 

5. a common value of 0 


The expected value and variance of a binomial random variable are 


H=n0 

(2 8 4) 

a^=^n0H-e) 

(2 8 5) 


Poisson Distributions 


If arrivals are random in time, that is, if P(A’{/)»* 
X arrivals between time aero and time /, and if 

x) IS the 

probability of 

, f(;f(i+io-x(o-is , 
il” HI 

/>0, 

dr>0. 

and 

, /'{X(,+ll,)-X(l)>2) . 

x-0.1 

1, (2 8 6) 

lim, a7 " 0. 

Si~,Q At 

f>0 

Af>0, 

then X has a Poisson distribution 

x->0 1 

12 (2 8 7) 

,>0 , 

=0 1 

(2 8 8) 


The distribution of time between arnvals is an exponential distribution (see 
Section 2 8 7) Note that if we are concerned with only one time, we can 
take this time as a unit and the symbol t may be omitted from the formula 
above 

The distnbution function is 

* OaVe-^ X 

2 h u‘e "du, x-0 I 2, (289) 

y_o y * ■'At 

(2 8 10) 

4-iJ (2 8 11 ) 



2.8 EXAMPLES OF DISTRIBUTIONS 


67 


If are independently distributed with Poisson distributions 

with expected values Al,, respectively, A', H +X„ has a 

Poisson distribution with expected value X(/, + ■ • • + t„). 

2,8.4 Uniform Distributions 

If the probability that X is in an interval {Xj^,Xjj) is proportional to the 
length of the interval for and Xy both inside an interval (a, 6), then X is 
said to have a uniform distribution over the interval {a,b). Its probability 
density function is 



,0, 

x<a 


fx(x) = - 

1 1 

b — a’ 

a<x < b 

(2.8.12) 


.0, 

x> b 


and its distribution function 





lO, 

x<a 



1 x — a 

1 b — a' 

a<x < b 

(2.8.13) 


ll, 

x>b 


a + h 

_{b-af 

(2.8.14) 

2 

o 

12 


Figures 2.7 and 2.8 illustrate the approach to the normal distribution of 
the distribution of averages of independent observations from a uniform 
population. In Fig. 2.7 the horizontal scale is the average of n observations. 
In Fig. 2.8 the scale is Vn times the average, a scale which makes the 
standard deviation of the distributions equal. 


2.8.5 Normal Distributions 


The distribution that appears most frequently in this book both as a 
distribution in its own right and as an approximation to other distributions 
is the normal distribution. We have seen the normal distribution in the 
central limit theorem (Section 2.7.3). A random variable with a normal 
distribution with mean ji and variance a^, [abbreviated A ( /r, a^)], has the 
density function 


/(^)=— =r-exp 
V2w a 


2 



(2.8.15) 


F{x) = 


1 

V2ct a 



l(^) 


dv 


(2.8.16) 



OIAPTER2 PROBABlim 



figure 2 7 Probability deciiiy lunciioos for avenge of " independent obMrvitioai from a 
URtform distnbution 



The standard normal density 9(2) and the standard normal distribution 
funcdon src to be 

(2 8 17 ) 


2.8 EXAMPLES OF DISTRIBUTIONS 


69 


respectively, z is referred to as a standard normal deviate. We can express 
f{x) and F{x) in terms of <p(x) and ^(x). Thus 

Normal density functions with ji = 0 and (t = 0.4, 1.0, and 2.5 are shown 
in Fig. 2.9. (p(x) is pictured in Fig. 2.10 and <I>(z) in Fig. 2.1 1. 



Figure 2.9 Normal probability density functions. 





70 


CHAPTER 2 PROBABILITY 



Figure 2.11 Sundard nonnal dislribution function 


Table 2 13 gives selecied values of 4>(2) More extensive tables are 
readily available 


Table 2 IJ Standard Nonnai Disfribuiion Function 



<Mr) 

r 

4-(r) 


5 

1282 

90 

01 

5398 

IS 

9332 

02 

5793 

1645 

95 

02 

6179 

I960 

975 

04 

6554 

20 

9772 

05 

6915 

2 326 

99 

06 

7257 

25 

9938 

08 

7881 

2 576 

995 

10 

8413 

30 

9987 


If X has a normal disinbuUon with mean p and variance o* the 
probability of X being between a and b is 

(28 19) 

If Xf X2 ,X„ are independent random variables each normally dtslnb 
uted with mean p and variance 


_ x.+X2-¥ +a:, 

x =— — = 


(2 8 20 ) 



2.8 EXAMPLES OF DISTRIBUTIONS 

is nonnally distributed with mean n and variance a^/ n. In other words, 

( 2 . 8 . 21 ) 

a 

has distribution function $(z). 


2.8.6 Multivariate Normal Distributions 

Of the many possible generalizations of the normal distribution, the one 
which is called the bivariate normal is that with density function 


fx,Y 


Xexp 


2-?ra^ay\f\- 


2(1 -p2) 


x-p.^ 


-2p 






( 2 . 8 . 22 ) 

Not only are the marginal distributions of the variates normal but all 
conditional distributions of one variate for fixed value of the other are 
normal. The choice of symbols for the parameters indicates exactly the role 
these parameters play in the joint distribution. The correlation coefficient 
between X and Y is p. 

The marginal distribution of X has the density function 


fx (■^) ■“ 


1 




exp 


1 

2 [ o, j 


(2.8.23) 


The conditional distribution of Y\X is 


fY\X (j’l'^) — 


V2w ay\Jl - p^ 


■exp 






y-Py-p-ri^-t^J 


(2.8.24) 

showing incidentally that the expected value of Y conditional on X=x is 



72 


CHAPTER! PROBABILITY 


and Us condUional variance is 

c/O-p") (2 8 26) 

A little algebra will disclose that, if X aod Y have a bivariate normal 
distribution, the random variable is normal with mean 

+ (2 8 27 ) 

and vanance 

fl*o^ + h^^+2af>pfl,o, (2 8 28) 

In general, an n vanate normal has density function of the form 

- 5 2 p * 29) 

The relation between the a^’s and the variances and covariances of the X/t 
IS difficult to present without using matrix concepts In Chapter 6, (6 J 66) 
gives the multivanate normal density function in terms of the vanan- 
cc-covanance matrix of the X’s 

2 87 Gamma Distributions 
A random vanable Y with density function 

/,(»- ■'>'>. ( 2830 ) 

0, otherwise 

is said to have a gamma distnbution with parameters a and A Its 
distribution function is 

lt,=a\ (2 8 32) 

(2 8 33) 

The parameter a need not be an integer One application in which it is 
restricted to integral values is that m which Y is the time to the ath arnval 
under the conditions described tn Sectioa 28J 



2.8 EXAMPLES OF DISTRIBUTIONS 


73 


If 7,,..., 7^. are independently distributed with gamma distributions 
with parameters (a„X), {a 2 ,K), ■ . ■ then 7, + • • • + 7^. has a gamma 

distribution with parameters (a,H 

A gamma distribution with parameters (1,A) is said to have an exponen- 
tial distribution with expected value X. 


2.8.8 Chi-squared Distributions 

If 7 has a gamma distribution with parameters (j'/2,2) it is said to have a 
(chi, ch as in Christmas) distribution with v degrees of freedom. Its 
density and distribution functions are 


and 


fx^(y)= 


1 


10 , 




y>0 

otherwise 


(2.8.34) 




2’’/^r(j'/2) Jo 


(2.8.35) 


respectively. 

If are independently normally distributed with expected values 

gi>...,g„, resepectively, and variances respectively, the sum 


2 

/=! 



-^1 

<^1 



Mn \ 

I 


(2.8.36) 


has a x^ distribution with j’ = n degrees of freedom. 

If 7,,..., An are independently normal with common expected value g 
and common variance a^, and if 

n 

then 


(2.8.37) 


has a x^ distribution with p = n — l degrees of freedom. Sums such as 
(2.8.37) frequently occur in parameter estimation. 



74 


CHAPTER 2 PROBABtLITV 


If Yy , Y„ are independently distnbuted as "'•‘h i-i .p„ degrees of 
freedom respectively, y,+ + is distnbuted as with «’,+ 

degrees of freedom 

The distribution of ^nd the distribution of are frequently of 
interest They are so closely related to that it is usual to use tables of 
F~i '( ) in calculations related to them 
The density functions of for 1,2 3,4,5, and 10 degrees of freedom are 
given in Fig 2 12 



0 12 3 4 S 6 7 8 9 10 11 12 13 14 IS 

Fl^re 2 12 Density fuaciioBs for disinbuiions (degrees ot freedom «e ind cated) 


To introduce a symbolism for the inverse of (he distribution function of 
X* with p degrees of freedom let x^(p) be the number for which the 
distribution function of with r degrees of freedom has value y A few 
values of x^Cv) are given in Table 2 14 


Table 2 14 Some Values of x^f*") 


Degrees of 

Freedom 





Y 





001 

0025 

005 

01 

05 

09 

095 

0 975 

099 

1 

000 

000 

000 

002 

045 

2 71 

3 84 

5 02 

6 63 

10 

2 56 

3 25 

394 

487 

934 

15 99 

1831 

20 48 

23 21 

15 

5 23 

6 36 

726 

855 

3434 

22 33 

23 00 

2749 

3058 

30 

1495 

1679 

1849 

2060 

2934 

4026 

43 77 

46 98 

50 89 

60 

37 5 

405 

432 

465 

593 

74 4 

791 

83 3 

88 4 



2.8 EXAMPLES OF DISTRIBUTIONS 


75 


2.8.9 t Distributions 


If X has a normal distribution with mean 0 and variance 1, and Y is 
independently distributed as with p degrees of freedom, the statistic T, 

T= — - — (2.8.38) 

VYTv 

has a distribution known as a I distribution with v degrees of freedom. The 
density function of T is 


r[(^+i)/2] 

Vttt r(;'/2) 


1 


{\ + t^/v) 


(y+l)/2 ’ 


— 00 < r < 00 (2.8.39) 


As V increases without limit, the distribution of t approaches the normal 
distribution as a limit. 

Density functions for t with 1, 5, and oo degrees of freedom are 
displayed in Fig. 2.13. 

To introduce a symbolism for the inverse of the distribution function of 
t with p degrees of freedom, let t^(p) be the number for which the 
distribution function of t with p degrees of freedom has value y. A few 
values of t^(p) are given in Table 2.15. 



Figure 2.13 Density functions for / distributions (degrees of freedom as indicated). 



76 


CHAPTER J PROBABIUTY 


Table 2 15 Some Values of 


Degrees of 

Freedom 




r 




05 

08 

09 

095 

975 

99 

995 

1 

00 

138 

308 

631 

1271 

3182 

63 66 

to 

00 

0 88 

137 

Ut 

2 23 

2 76 

3 17 

15 

00 

0 87 

134 

175 

213 

260 

295 

30 

00 

0 8S 

131 

170 

204 

246 

275 

60 

00 

085 

130 

167 

200 

2 39 

266 

120 

00 

0 84 

129 


198 

2 36 

2 62 

ee 

QQ 

0 84 

128 

t^ 

196 

2 33 

2 58 


The statistic 

VS(?-p) 

S 


(2 840) 


where y, Y„ are independently normally distributed with common 
variance and where 


ana 5*- “ ^ --j (2841) 


has a I distribution with I'-n-l degrees of freedom Although somewhat 
unusual the capital 5 is used in this section to clearly indicate that S is a 
random variable 

Many statistics of importance in parameter estimation have l distnbu 
tions 

Under the conditions stated in connection with (2 8 40) the statistic 


y-o 

5 




(2 8 42) 


IS said to have a noncentral t distnbution with noncentrality parameter 
(ft-fl)Vrt /<j 


2 8 10 F Distributions 

If U has a ^ distnbution with r, degrees of freedom (d f ) and has an 
independent distribution with pj d f has an F distnbu 

tion With V, and df 


(2 8 43) 



2.8 EXAMPLES OF DISTRIBUTIONS 


77 


The F distribution is used frequently in parameter estimation concerned 
with model building. 

To introduce a symbolism for the inverse of the distribution function of 
F with I'l and V 2 degrees of freedom, let F^{v■^, be the number for which 
the distribution function of F with j», degrees of freedom in the numerator 
sum of squares and V 2 degrees of freedom in the denominator sum of 
squares has value y- A few values of are given in Table 2.16. 


Table 2.16 Fg^{v^,v^ 


’'2 



»’i 




1 

10 

30 

60 

120 

00 

1 

161.45 

241.88 

250.10 

252.20 

253.25 

254.31 


4.96 

2.98 

2.70 

2.62 

2.58 

2.54 


4.17 

2.16 

1.84 

1.74 

1.68 

1.62 


4.00 

1.99 

1.65 

1.53 

1.47 

1.39 

■WM 

3.92 

1.91 

1.55 

1.43 

1.35 

1.25 

H 

3.84 

1.83 

1.46 

1.32 

1.22 



If A' has an F distribution with r, and V 2 degrees of freedom, X ~ ' has an 
F distribution with V 2 and degrees of freedom. 

2.8.11 Tables and Computer Programs for Commonly Used Statistics 

Programs for calculating values of the distribution functions of normal, t, 
X , and F distributions are available for many computers including some 
microcomputers. Expansions which are tabled in Abramowitz and Stegun 
[I] are the basis for most such programs. 

Many collections of tables of percentiles of normal, t, x^> and F 
distributions are available. All of the collections of references [3] through 
[9] contain these and more. The two handbooks also contain extensive 
collections of formulas used in statistical computation. 


REFERENCES 


I. 


2 . 


3 . 


Abramowitz. M. and Stegun, I. A., editors. Handbook of Mathematical Functions with 

x”/ Tables, National Bureau of Standards Applied 

Mathematics Senes No. 55, Washington, D.C.. 1964. 

W- “Describing Uncertainties in Single-sample Experi- 
1962. ’ ^“>‘‘^‘>cal Tables, Addison-Wesley Publishing Co., Reading, 





78 


CHAPTER! PROBABIUTY 


4 Pearson, E S and Hartley, H O, editors, Btometrika Tables for Siaiisiictans, 3rd ed, 
Umversiiy Press Cambridge 196< 

5 Fisher, R A and Yales, F, Susluacal Tables for Biological, AgncuUura! and Medical 
Research 6lh rev ed . Oliver and Boyd Loodoo, 1963, reprinted by Hafoer Publishing 
Co Darien Conn , 1974 

6 Beyer, W H , editor, CRC Hon&ook of Tables for Trobabihiy and Siaiisiics, 2nd ed , 
Chtmieal Rnbbtr Co , Ceveland, 1%8 

7 Bunngton, R S and May, D C, Jt , Handbook of Probobiliiy and Siaiisiics tnih Tables 
2nd ed., McGraw-Hill Book Company, New York, 1970 

8 Hald, A , 5(arir(<eo/ Tables end Formulas, John Wtley 4c Sons Inc , New York, !9S2 

9 Arkin, H and Colton R. R., ToMee^ Scncistictfiu, 2nd ed, Barnes & Noble (nc , New 
York, 1963 


PROBLEMS 


2,1 Three parts are subject to large stresses in starting a machine at the 
beginning ot the working day Let be the event that Part A fails, S the 
event that Part B faih, C the event that Pan C fails P(/t)*» 04, P(5)- 02, 
P(C)- 03, P{A and B)- 008, P(A and C)* 007, P(fl and C)- 01, P(A 
and B and C)« 0001 Find P(A ot B), P(A oi not B), P(A or B or C), 
P(neithet A not S) 

21 If 


/r(y)- 


0<P<1 

otherwise 


find Pr(y) 

2 J Consider the joint probability function given below 


^ 

X2 0 1 

0 A ~2b 

1 2A 4h 

2 3h tiA 

3 4ft 8ft 


2 

~3h 

~7h 

9ft 


(a) What IS A’ 

(A) Give the joint distnbuboa function 
Fwit tiui tnargnal probability functions 
(c) Are Y| and Xj independeniv 

Answer, (partial) For respectively 



PROBLEMS 


79 


2.4 If 


fx, Y ~ I 


^xy, 

0 , 


0< X < < 1 

otherwise 


find jFx.y(^’J')’ -^xW- •fr(>')- 

2.5 For the joint .probability function of Problem 2.4, give the conditional 
probability density function /^jy (x|y). 

Answer. 2 x/y^ for 0 < x < y < 1 ; 0 for 0 < y < x < 1 ; undefined otherwise 

2.6 If 

/x,yU..)')=|Q^-^’ 

find/x|y(x|y). 

2.7 A test for the presence of a pollutant which causes the quality of output to 
deteriorate is not infallible. It falsely indicates the presence of the pollutant 
with probability .05, and it fails to detect the presence of the pollutant with 
probability .01. From our experience, we would judge the probability of the 
pollutant being present is .01. If we get a positive indication from the test, 
what is the probability that the pollutant is present? 

2.8 If the actual diameters of shot vary uniformly over the interval (.200, .205), 
what is the distribution of volume? 

2.9 Define the new random variable Z which is related to the random variable X 
by 

Z = 2+4X+2X^ 

where X has the probability function P(X=\) = ^, P(X = 2) = \, 
^(^ = 4)= i and P(A'=x) = 0 otherwise. Find the mean of Z. 

Answer. 21 

2.10 Consider the joint probability density for X, Y, and Z, 

x> 0 ; a ,>0 
a > 0; > 0 

(a) Find A in terms of and 03 , 

(b) Find E{X) = iix- 

(c) Are X, Y, and Z independent? 

(d) Give the joint distribution function for X, Y, and Z. 


0< X < y < 1 
otherwise 



CHAPTER 2 PROBABIUTV 


2 11 Compute the mean and vanance for the discrete probability functions given 
below 

/»(Ar=o)*=, 

P(;¥=«ai)=0 for 1 


=•0 otherwise 

Answer 4 * 

(f) P(A'-^)-e forjr-0 12 

■0 otherwise 

Answer 2 2 

2 12 Compute the mean and vanance of (he distnbunons whose density function 
are 

<a) 


(f»> 


Answer 0 , 

<c) 


W W<J 
otherwise 


/U).((2A)'’. '■« <>0 

1 0 other 


Answer V2/n I — — 

V 

213 Find the vanance of Z of Probkin 29 (How does it compare with the 
vanance of Z T’) 



PROBLEMS 


81 


2.14 For the joint probability function given below, find the covariance between 
X and Y. 


X 

y i 2 3 4~ 

0 0 ai 0 Od 

1 0.1 0.2 0 0.1 

2 0.2 0.1 0.1 0 


Answer. — .34 

2.15 If F=S;=,a,y„ and 

£([»-, )])-(’* [" ‘~Jj 

prove that V{F) = a^'2,”^ia}. 

2.16 For the distribution of Problem 2.4 find the covariance of X and Y. Find the 
coefficient of correlation between X and 7. 

2.17 If y,= y,_, + £„ /= 1,2,..., yQ==0 where e, are normal, independently distrib- 

uted with zero expected value and variance 5, find the correlation between Y 
and y,_|. ‘ 

Answer. V('- 0/' 

2.18 An autoregressive model is defined by 


where eo=0 and p is a given constant, — 1 <p < 1 

u,--N(p,a^) 1=1,2,...,//, E(u,Uj) = 0 for i^j. 

(a) Show that F(e,) = 0 for all i. 

(b) Show that 

£■(£,€,) = p'-V, /=l,2,...,n 

(c) Show that 

Ei^2^,) = p'~^0+P^)o\ /=1,2,...,/7 

(d) Show that 

= +p^-^p‘*^ -f-p^<'“'>](72, /■= 1,2,...,/! 

(e) 'What is the limiting value of as /— >oo? 



82 


CHAPTER 2 PROBABIUTV 




—sL. p^(>-p^*) 

i-p* i-p^ 


,.| J !-p* 


2 19 Toss two pennies 12 times and record your results for the number of heads 
Repeat this three times Compare the actual numbers of heads with the 
expected number 

2J0 What arc the expected value and variance of the number of successes in iOO 
independent trials in each of which the probability of success is j ’’ What are 
the expected value and the vanance of the fraction of trials which are 
successes'* 

2.21 If y IS normally distributed with expected value 100 and standard deviation 
5,fmd(fl) P(y>l00).(l>) /•(y<9S) (c) /*(}K-I00|<10).(d) P(y>9S) 
(e) />(}y~100l>)0).(/) they such that P(r>y)- 95. (g) the c such that 

f(iy-iooi<c>- 90 

2 22 If y IS normally distributed with expected value 100 and standard deviation 
5t and if the average of a sample of 25 is y, find (a) P(y>IOI) (b) 
^(r>l01). (c)^P(|f-l00|>2X(d) thee such that P(y>e)- 95 (e) the 
e such that 100]>c)« 05 

2.23 If y has a Poisson disinbuiion with Aia4, find P(3 < y<5) and compare 
with the normal approximation, P(25< y<55)for ynormalwnh £(y)*=4 
and V'(y)=4 

2J4 Prove (hat /i and are indeed mean and variance respectively of the 
random variable whose density function is given by (2 8 15) 

225 Derive (2 8 24) 

226 Derive (2 8 26) 

227 One of the convenient properties of the normal density functions is the 
following identity Venfy algebraically that for any real numbers x, |i, Hj, o, 
and 02 (<’i>0 O 2 > 0 ) 




PROBLEMS 


83 


where 


- + 2_ 

^ af + ai ’ of+cri 

2.28 Using (2.8.22), derive (2.8.23). Note that 

f°° dx=^ TT / a 

—CO 

j°° xtxp\^—a{x — fLY—b{x — p)^dx = e‘’^^^‘‘ ^ fi— ^ j Vw/o 

2.29 Eleven independent observations were taken from a normal population, 

{32.2, 32.7, 31.9, 32.9, 32.3, 31.7, 32.6, 32.5, 32.5, 32.2, 32.2). 

(a) Find the sample variance. 

(h) What is the probability that the sample variance of a random sample 
from a normal population with 0 ^ = 6 would be larger than the value 
found in (a)? 

230 In addition to the sample of Problem 2.29, two observations from another 
normal population with the same variance but not necessarily identical mean 
were obtained. The sample variance of this sample of two is to be calculated. 
What value will be exceeded with probability 5% by the ratio of this sample 
variance to a random sample of 1 1 such as that of Problem 2.29? 




Introduction to statistics 


In Chapter 2 we developed the idea of a probabilistic model We studied 
how a knowledge of the model may be used to derive probabilities of the 
vanous possible outcomes in (he sample space and of the various values of 
a random variable defined on the sample space We now turn to problems 
in which the values of one or more parameters m the model are unknown 
and we are to use observations to estimate them 

For example, assuming that the heat equation for an infinite slab 
adequately represents the physical aspects of the experiment, we may 
gather data on temperature changes on the face of an actual slab and 
estimate thermal conducUvity We may assume exponential decay of 
acoustical energy in a recording studio, measure acoustical energy as a 
function of time at vanous points in the studio, and estimate the rate of 
decay 

In studying methods of denvmg estimates from data, it is worthwhile to 
use different words to distmgmsh between a method of estimating and the 
result of applying the method An estimator is a formula or procedure for 
deriving an estimate from a sample An estimator is a random vanable An 
estimate, by contrast, is a number The word statistic is used both for a 
formula for denvmg a number or numbers, not necessanly an estimate, 
from a sample and for the number obtained by applying the formula to a 
particular sample 

In this chapter we consider what can be expected of estimators and how 
to choose the best possible estimator for our immediate purposes After we 
look at one or two more of less specific problems, we begin a somewhat 
84 



3.1 SOME EXAMPLES OF ESTIMATORS 


85 


systematic consideration of characteristics that we would like estimators to 
have and investigate ways of devising estimators that possess these char- 
acteristics. 


3.1 SOME EXAMPLES OF ESTIMATORS 

Before beginning a formal study of methods of developing and evaluating 
estimators, we look at a few particular problems. These serve to introduce 
some basic ideas and raise some questions that are studied in the re- 
mainder of the chapter. 

3.1.1 Two Estimators of the Center of a Symmetric Distribution 

Suppose that we have a sample of n independent observations = 
l,2,...,n from the same distribution and that the one thing we know about 
this distribution is that it is symmetric. Suppose that we wish to estimate 
the center of symmetry. A random variable X is symmetrically distributed 
if P {X < y — t} = P {X > y + 1} for some y and for all /; y is the center of 
symmetry. We might look for symmetry in the sample but samples, even 
from symmetric distributions, are unlikely to be symmetric. It is fairly 
clear, however, that, if y is a center of symmetry, it is also the expected 
value of Af if Af possesses an expected value and, in addition, it is the 
median of X. By definition, A is a median of X if 

T{Ar<A)>i and P[X>\]>\ (3.1.1) 

The median of a sample, M, is similarly defined as a number such that at 
least half of the sample values are at least as large and at least half of the 
sample values are at least as small. If there is an interval of medians, it is 
quite common to name the average of the largest and the smallest median 
as the median. 

Two of the many possible estimators of the center of a symmetric 
probability distribution are the mean and the median of a sample of 
independent observations. 

If a continuous random variable Y has distribution function Fy(y) and 
probability density function fyiy), the probability density function of the 
median of a sample of an odd number n of independent observations — and 
for simplicity, we deal only with the case of an odd number of ob- 
servations — can be shown to be 




7rM (3.1.2) 



CHAPTER 3 JNTRODUCnON TO STATISTICS 


If 

Pi’) 

then 

where the notation of (2 8 18) is used The probability density function for 
the median of a sample of n (n odd) is given by (3 1 2) The density 
function for the mean of a sample of n is [by (2 8 21)] 

Figure 3 1 pictures the probability density function of mean and of median 
for samples of siae 101 from the disinbuiion of (3 1 3) It is clear that the 
mean is the better estimator from any reasonable point of view The 
probability that the mean of a sample of 10! will be within e of the center 
of symmetry of the distnbuuoji IS foralU>0 greater than the probability 
tha_^the median of 101 will be within e of the center Fore- 12 F(|i- 12 
< ^<(1+ 12)- 77 PCpi- l2<M<ii+ 12)- 66 
The comparison of mean and median of samples of 101 independent 
observations on a double exponential random variable with probability 
density function 

/rW-;«»P(-b-(‘l) (3 15) 




3.1 SOME EXAMPLES OF ESTIMATORS 


87 


is quite different. The probability density functions for the two estimators 
are pictured in Fig. 3.2. The probability that the mean of a sample of 101 
will be within e of the center of the distribution is, for all e > 0, less than 
the probability that the median will be within e of the center, /’(ja — .12< 
f<g + .12) = .61; P(/r-.12<M</x + .12) = .75. 

The choice of an estimator is not always as clear-cut even when the form 
of the distribution is as fully known. It is possible, for example, that for 
two different e’s, e, and £ 2 , and for two different estimators of 6, T, and T2, 

and 

P(l7’,-01<e2)>P(lT2-0|<£2) 

3.1.2 Estimating a Variance 

If we wish to estimate the variance F( T) = of a random variable Y — see 
Section 2.6.2 — using a sample of n independent observations, we might 
consider the statistic 

(3.1.6) 

To see that the expected value of this statistic is (if is finite), we note 



exponential distribution. 



01AFTCR3 INTRODUCnON TO STATJSnCS 


that 

(3 17) 

by definition and 

fiii-.-rfi 

’ (318) 

" "t-j ”/-i 

Unfortunately, to calculate (3 I 6) we must know ft 
It seems reasonable to ask whether 

2(1-,-?)’ 

(3 19) 

might be used as an estimator of It can of course but it will tend on 
the average to underestimate This can be seen by noting that 

i 2 [(j-, -?)+(!"-<■,)]’ 

- i(K-yf* S/y-^rf*2(y-,,]i(r-F} 
-2(l’.-?f+'’(?-l‘.f (3 110) 

Dividing the left member by n and taking expected value we get [by 
(3 1 8)] Dividing the tight side by n and using (3 1 9) and (2 6 19) we get 

E(A)+£(Y-tiyf=£:(A)+^ (3 1U) 

Solving for £(/t) 


£(A)= 


(3 I 12) 



3.2 PROPERTIES OF ESTIMATORS ^ 

Rather than using J as an estimator of we use 

” ' 1=1 


which is an unbiased estimator of that is, its expected value is We 
use a capital S, as we did in Section 2.8.9, to emphasize the fact that we are 
here talking about a random variable. We see that an estimator may 
sometimes be modified to give another estimator the properties of which 
please us more. Similar modifications of estimators of variance are used 
throughout the following chapters. 

It is remarkable that we can derive the expected value of a sample 
variance for independent observations with no information about the 
distribution of the random variable except that it possesses a variance. In 
fact, any unbiased estimator of the standard deviation depends on much 
more detailed information about the distribution of the random variable. 

If { 7; ; / = 1 , 2, . . . , n } are independent normally distributed with common 
variance a\ the statistic 


rt — 1)5 

W— (3.1.14) 


has a distribution (Section 2.8.8) with n — 1 degrees of freedom. The 
distribution function of is the distribution function of [u^/(«— l)]x^ 
that is, a distribution with a scale change in the argument of the 
function. 


3.2 PROPERTIES OF ESTIMATORS 
3.2.1 Unbiasedness 

A statistics T is said to be an unbiased estimator of a parameter 6 if 

E{T) = e (3.2.1) 

Unbiasedness sounds like a good property for an estimator to have. It is 
a simple property to describe. Roughly, an estimator is unbiased if, on the 
average, it yields the correct value of the parameter. 

Other properties are almost always more important to us, however. 
Fortunately, we can sometimes find an unbiased estimator which also has 
the other properties we want for a particular apphcation. 

The standard deviation of a sample from a normal population with 
standard deviation a is biased, whether standard deviation of a sample is 



90 


CHAPTER 3 INTRODUCTION TO STATISTICS 


defined with a divisor of « or, as we have defined it. with an n - 1 In fact, 
reference to the distribution (SecDon 2 8 8) yields 



For n=s2,£(5')s= 80a, for n = 5, 94o, for n=*10, 97o 
Example 3.2.1 

What, if any, multiple of the average of n independent observations on a gamma 
random variable with known fixed a— see Section 2 S 7 — will be an unbiased 
estimator of X7 

Solution 

Using the properties of gamma random variables desenbed in Section 7 S 7 we see 
that, if each observation has parameters (a X) the sum of n has the parameters 
(na.X) The expected value of the sum is noX and the expected value of the average 
IS therefore aX The sample average multiplied by J/a will be an unbiased 
estimator of X 

3.2.2 Consistency 

Unless a sample is actually the whole population being sampled, an 
estimate of a population parameter cannot be expected to be equal to that 
parameter If we have a sequence of estimators one for each sample size, 
we would hope that larger samples would tend to give better estimates 
Aconsisrenf sequence cfes(ima(ors, 1,2, , of a parameter, ff, is 

one for which 


^hm F(|r(n)-^|<«)‘'l foreveiy6>0 (323) 


Thus a consistent sequence of estimators is one for which a sufficiently 
large sample is almost certain to produce an estimate close to the parame 
ter value 

Example 3.2 2 

Show that sample means forni a consistent sequence of estimators of the popula 
uoti mean if the diiUibution sampled possesses a variance 



3.2 PROPERTIES OF ESTIMATORS 


91 


Solution 

If the distribution possesses a variance, the weak law of large numbers (see Section 
2.7.2) says 

lim /’([A'— ja|<e)= 1 (3.2.4) 

Hence, the sequence is consistent. 


3,2.3 Efficiency. Minimum Variance Unbiased Estimators 


Recall Figure 3.1 picturing the distribution of the mean X and of the 
median M for samples of 101 from a normal population. The distribution 
of X is, of course, normal (see Section 2.8.5). It appears that the distribu- 
tion of M is approximately normal. A mathematical analysis confirms the 
approximate normality of the distribution of M for large sample size and 
yields the additional information that the variance of M is approximately 
•na^ Jin compared with a^/n for the variance of X. In both cases is the 
variance of the normal population being sampled. If we wish a given 
probability that our estimate will not differ from /r byjnore than a 
specified amount, we could attain our objective using either Afor M but we 
would need only 2/^ as large a sample if we choose to juse X rather than 
M. We say that M has an efficiency of 2 / 77 - relative to X. 

If r,(«) and T^in) are unbiased estimators of 6, the relative efficiency of 
TJn) relative to T 2 {n) is 


relative efficiency = 


V[T2{n)] 


(3.2.5) 


In certain cases there is a smallest possible variance for unbiased 
estimators based on n independent observations. Under certain regularity 
conditions, the variance of any unbiased estimator T(n) from a distribu- 
tion with probability function or probability density function /(x|0) is 


v[T(n)]> 



1 

/ain/(x|0) 

\ W 



(3.2.6) 


This inequality is the Cramer-Rao or Cramer-Frechet-Rao inequality. 
(The regularity conditions are not satisfied if the range of possible values 
of X depends on 0.) 



92 


CHAPTER 3 INTRODUCnON TO STATISTICS 


If a lower bound on the vanance of an unbiased estimator of 6 is 
calculable by (3 2 6) the efftciew^ of an unbiased estimator r(n) of S is 
defined to be 

1 ) 


Example 3 23 

Find a lower bound to the v&Tiance ot an unbiased estimator using a sample of n 
of the mean of a normal random vanable with known variance Making use of (he 
fact that the vanance of the median is approximately va^/ln for large n find 
the efficiency of the median as an estimator of the mean of a normal population 

Solution 

Develop forms (3 2 6) and (32 7) for (his specific case 

)n/(3f|p)- - j lnl2,T>-}ne- (3 2 9) 

' <3210) 




(3211) 
(3 2 12) 


A lower bound to the vanance of an unbiased estimatoi of ihe mean of a normal 
distribution with known vanance using a sample of size n is a^/n 

Since the vanance of the median is wi^/Ih the efficiency of the median is 


eV" _ 2 

■sa*/2n ' 

Whether or not the conditions under which (3 2 5) may be used are 
satisfied an estimator which has the minimum possible variance for an 
unbiased estimator is called a minimum uina/ice unbiased eslimator Many 
commonly used estimators are muumum vanance unbiased estimators 



3.2 PROPERTIES OF ESTIMATORS 


93 


Considerations of efficiency become much more complex if we do not 
insist on unbiased estimators. Minimum variance is a meaningless criterion 
since we can easily attain zero variance with a worthless estimator. For 
example, we can always attain zero variance by always estimating the 
parameter to be 100. 

One useful generalization of minimum variance to situations in which 
biased estimators are allowed is minimum expected square deviations. A 
minimum expected squared deviation estimator is an estimator T of the 
parameter 9 for which is the smallest possible for any estimator 

of 0. If E{T) is 9, a minimum expected squared deviation estimator is a 
minimum variance unbiased estimator. 

3.2.4 Sufficiency 

Roughly, a sufficient statistic is one which contains all the information 
from a sample which is relevant to the estimation of any property of the 
random variable being sampled. A sufficient statistic need not be an 
estimator; it need only contain all the information necessary for an 
estimator with the properties of any estimator we might care to devise. 

Before giving a definition of sufficient statistic, let us consider the 
relative information contained in two estimators of the mean of a normal 
distribution with known variance. If we are using a sample of size n to 
estimate g, we might ask whether knowing the sample median as well as 
the sample mean would make it possible to better estimate the population 
mean. It turns out that the conditional distribution of the sample median 
given the sample mean does not depend on p. Furthermore, it turns out 
that the conditional distribution of the observations given the sample mean 
(a distribution of an n — 1 dimensional random variable) does not depend 
on ju and hence, if the sample mean is kn^wn, no further information 
about the sample is relevant to p. We call X a sufficient statistic for the 
family of normal distributions with known variance. 

Finding conditional distributions of observations given statistics can be 
tedious. Fortunately, we can determine whether the conditional distribu- 
tion depends on the parameter without finding the conditional distribution 
explicitly. 

Definition 

r is a sufficient statistic for a family of distributions whose members are identified 
by values of a parameter $ if the joint probability function or the joint probability 
density function can be factored into two factors, one of which is the probability 

function or the probability density function of T and the other does not deoend 
on e. 


94 


CHAFn»3 INTRODUCTION TO STATISTICS 


Example 3.2 4 

For the family of distnbutions represented by the density function 

~eo<x<co (32 13 ) 

V2* 

we have the joint density function for n independent observations 

JW . -CO<X;<W, 

1 = 1,2 .« (3214) 

The density function of X is 

Oi'S) 

^ c«p(- i I (32, « 

Since 

- 2 02 ) 1 ) 

<>l 

Thus we see that A" is a sufficient statistic in the family of normal distributions with 
unit standard deviation 

For the normal family, x and s are jointly sufficient for {i and a 
3 2.5 Maximum Likelihood Estimators 

In the estimators so far considered, we have generally looked for some 
meaning for the parameter and have sought an esitmator which has a 
similar meaning with respect to the sample There are some more general 
methods for deriving estimators One of these is the method of maximum 
likelihood In essence this method consists of choosing from among the 
possible values for the parameter, the value which maximizes the probabil 
ity of obtaining the sample whKh was obtained 
In dealing with a lamily of probability djstribujjons, tve have found it 
convenient to use the symbol /(jc, ,xJ9) to represent the joint probabil- 

ity function or the joint probabili^ density functions for discrete or 



3.2 PROPERTIES OF ESTIMATORS 


95 


continuous random variables, respectively. For each possible value of 6, 
/(x„...,x„j0) defines the distribution of We may wish to 

consider more than one member of the family at a time; in this case ^ is a 
parameter. When 6 is fixed, we know which function of x,,...,a„ we are 
dealing with. If we fix 9 and represents a number. 

In the preceding paragraph we posed the problem of finding that member 
of a family of probability distributions for which the probability of getting 
the observations we got (or for continuous random variables, the probabil- 
ity density for the observation we got) is greatest. Here we look at 
f{x^,...,x„\9) as a function of 6 for fixed values of x^,...,x„. This function 
is not a probability density function. We need a new name. If 
f{x^,...,x^\6) is a probability function or probability density function for 
the random variable x^,...,x„ for fixed 0, it is the likelihood function of 9 
for fixed x^,...,x„. For example, the family of binomial probability func- 
tions for a fixed n can be described by 

/(x|0) = (")0"(l-0)”“", x = 0,l,...,n, O<0<1 

Although the random variable X is discrete, the possible values of 9 are 
continuous. That f(x\9) is not a probability density function for 9 for fixed 
A is easily seen when we note that 

and not one. 

We use the letter L for likelihood function. (Some authors use L for the 
logarithm of likelihood.) 

Example 3.2,5 

Consider an example involving the binomial distributions. If all values of 6 
between 0 and 1 are possible and if we observe x successes in the n trials, we 
choose as our estimate that value of 9 which maximizes L = (^)9^(\-9)''~^. The 9 
which maximizes L also maximizes the natural logarithm of L or 

lnL = ln(")-t-jicln0-l-(/?-jc)ln(l-0) (3.2.18) 

which is easier to work with than L in this case. Now take the first and second 
derivatives of InL to find 

n-x 9flnL_ x n~x 
9^ 9 1-9 902 


(3.2.19) 



96 


CHAPTER 3 IVTRODUCTION TO STATISTICS 


Since the second denvative is always negative, the value of which, makes the first 
derivative zero maximizes InZ. Designatmg the estimator of 0 by B, we have 

(«-;t)=0 Of (3220) 

Example 3.2.6 

For the rectangular distribution with range {0,0) 

L^^.O<x,<.0, ,n (3221) 

to maximize L subject to Q<jc,<tf we make 6 as small as possible under these 
restrictions This smallest possible value is max(x,,xj, ,x,) Hence 

tf=max(x|,jf 2 . ,jr,) (3222) 

Example 3,2,7 

Find maxiiRum likelihood estimators of g and e for a normal distnbution using a 
sample (Afj, ,X,) 

Solution 

We find InZ.. differentiate with respect to each of the parameters, and set deriva- 
tives equal to zero, finding second derivatives to make sure we have a maximum 


27_i(x,-p)* 


lnL“ — s-lndsr) — /lino— 


3in/. ^ s:7.i(x,-p) a%f, _ 

3/1 


91nL_ n ^ 3MnL n 3Z7_,(x,-n)* 


2(x,-/l) = 0, 5- 


*1 


(32 23) 
(3 2 24) 
(3 225) 
(3 2 26) 
(3 2 27) 


where (3 2 27fl) comes from setting flic expression at (3 2 SStr) equal to zero and 
(3 2 276) from setting the expression at (3 226n} equal to zero The maximum 



3.2 PROPERTIES OF ESTIMATORS 


97 


likelihood estimators of fi and a are then 





= x. 




(3.2.28) 


Now the second derivative with respect to /i is always negative. The second 
derivative with respect to a at the point at which the two first derivatives are zero is 




3nS^ 


2m 


(3.2.29) 


which is negative. Hence we have maximum likelihood estimators. 


3.2.6 Estimators a posteriori 

If we know something about which values our parameter is likely to have, 
we should be able to obtain better estimates by using this information. If 
our parameter is a random variable, a value of which has been chosen in 
accordance with its distribution, this value being an unknown constant 
throughout our experiment, Bayes’s theorem gives us a means of combin- 
ing this prior information with the results of our experiment. In cases in 
which our prior knowledge amounts to much less than a prior distribution 
of parameter values, we may still find it useful to form one or two 
hypothetical prior distributions and see what estimators are suggested by 
these prior distributions. 

For purposes of discussion here, we assume that a prior distribution of 
our parameter is available. If this prior distribution has density function 
g{9) and the conditional probability function or probability density func- 
tion of our observations given 6 is f{x^,X2,.■■,xJ9), Bayes’s theorem tells 
us that the posterior density function of our parameter is 


g{9\x^,...,x„) = 


gie)f{x„...,x„\9) 

{-^g{u)f{x^,...,x„\u)du 


(3.2.30) 


If we have a prior probability function for the parameter, a sum appears in 
place of the integral in this form. For some applications of (3.2.30) it is 
convenient to note that 9 appears in the right member only in the 
numerator. 


Example 3.2.8 

If we think that the parameter of the Bernoulli distribution which we are investigat- 
ing is likely to be near 2/3 with probability density falling off to zero at 0 = 0 and 
6 = 1 and with expected value about .6, we might be interested in using the prior 



98 


CHAPTER 3 INTRODUCTION TO STATISTICS 


probability density function }29\l—0) We run our experiment and obtain x 
successes m n trials Let us find the postenor density Now 

O<0< 1 (3 2 31) 

g(x„ x = 0,l, ,» (3232) 




■'0 


r(«+5) 

r(«+5) 


I2r(x + 3 )r(M - x+2) r(x + 3 )r(n - x +2) 


If we use a body of dala in conjunction with an tnitial prior dtstnbuhon 
to get a postenor distribution, then use the posterior as a pnor m conjunc- 
tion with a second body of data to get a second postenor distnbution, we 
end up with the same postenor as though we combined the two bodies of 
data and used the combined data in conjunction with the initial prior 
Fortunately, as data accumulate, the initial pnor distnbution matters 
less and less We may therefore took among functions which are easier to 
work with for one which fils our feeling as to the correct prior (We do 
want to make sure that we do not initially rule out any possible value, 
since once ruled out, it cannot later show up among the possible values no 
matter what the data are ) 

We may find a suitable prior among functions g{6) which are proper- 
uonal to likelihood {unctions In this case, g($!x,, ,x„> is also propor- 

tional to a possible likelihood function 
For an investigator who knows what family of distnbutions his data 
come from but has no idea whatever from which member of the family 
they come, a concept of “noninjormauce prior" distribution has been 
developed Noninformalivc pnors have impressive properties and may in 
many cases be the most suitable choice There are however no universally 
acceptable criteria for choosing a prior distnbution 


3.2.7 Bayes Squared Error Loss Estimators MAP Estimators 

Some people feel uncomfortable with a distnbutton of parameter values 
caJhfti: than, a vahie For such, people there are..o£cQurse,a vacifity of 
ways of using a parameter of the postenor distribution as a point estimate 
One possible estimator is flie mode of the posterior distribution This 
estimator is called the maximum a postenors esSirnator of MAP estimator 



3.2 PROPERTIES OF ESTIMATORS 


99 


The value of the parameter which maximizes the posterior density function 
also maximizes the joint density function for parameter and observations, 
the numerator of the right member of (3.2.30). 

Example 3.2.9 

Find the MAP estimator for the estimation problem of Example 3.2.8. 


Solution 

Finding the first and second derivatives of the logarithm of g(0|x,,...,x„) [or of 
g(9)f{xi,...,x„\6)] with respect to 9, we have 


01ng(0|x„...,A:„) _ + 2 n-x+\ 
d9 9 \-9 

3^1ng(g|xi,...,x„) _ x + 2 n-x+\ 

96/^ 9^ {\-9f 


(3.2.34) 

(3.2.35) 


we see that the second derivative is always negative and hence the estimator may 
be found by setting the first derivative equal to zero. 

{x + l)(.\-§)-9 {n-x+\) = 0 (3.2.36) 


»- x + 2 
^ n + 3 


(3.2.37) 


If n = 0 and therefore x = 0, this posterior estimate reduces, as it should, to the 
value which maximizes the prior distribution. As n increases, the prior distribution 
of 9 matters less and less and the estimate of 9 approaches the maximum likelihood 
estimate, which is x/n. 


Another possible estimator is the expected value of the posterior distrib- 
ution. This estimator is called the Bayes estimator for squared error loss. 

Example 3.2.10 

Find the Bayes squared error loss estimator for the estimation problem of Example 
3.2.8, 


Solution 


£( 0 ) = 


r(n + 5) 


r(x-l-3)r(n — x + 2 ) Jq 




r(/i + 5)r(x + 4)r(n — x + 2) x + 3 
r(x + 3)r(/7-x+2)r(« + 6) n + s 


(3.2.38) 



100 


CHAPTER 3 INTRODUCTION TO STATISTICS 


If n^O, the expected value reduces, as «t should, to the expected value of dte 
prior distribution As n increases, the prior distribution matters less and less and 
the estimator 6 approaches the imnimam vanance estimator 

If, for the Bernoulli family we take a prior distribution of 0, 

s(g)°r(„) r (^ ) '’* ■ «>aa>o,o<«<i 


and observe x successes in n tnals, the posterior distribution of 0 will be 


.x„) = 


T(n + a+ P) 
f(x~+a)r(n-~x+fi)^ 


'{i-sy 


The MAP estimator of 0 is (a + j: - !)/(<» + P + n~2) unless a + xO or 
a+/3+n<2or both, tn which case the estimator is x The Bayes estimator 
for squared errorless is ia + x)/in+a + 0) Note that the MAP estimator 
IS the maximum Itkelthood estimator if a®/5*I The minimum variance 
unbiased estimator of 0 cannot be the Bayes estimator for squared error 
loss since neither a nor 0 can be zero If the prior is the 

tioninforinative prior and (he MAP and Bayes squared error loss estima- 
tor are on either side of the minimum variance unbiased estimator 

3.2.8 Bayes Intervals 

If we have a probability distnbuiton for our parameter 0, whether an a 
prion distnbution or an a postenon dislnbution (see Section 2 4 3) we can 
find the probability that the parameter is in any interval of possible values 
If the distnbution is continuous, we can find intervals for any desired 
probability For many pu* 70 Ses an interval of shortest length, that is, one 
concenU’aied m the region of greatest probabiliry density, is probably most 
desirable If the probability density function is unimodal and symmetric, 
such an interval is relatively ea^ to find If the density function is not 
symmetric, we may be satisfied to use an interval which leaves out extreme 
values at each extreme with equal probability Thus for a distribution with 
distribution function Fi0} we might choose the interval (0, 0^ where 
F(0f) = 1 — F{0^ For a symmetne, ununodal distribution the two types of 
intervals are identical 

We present herewith a Bayet interval for the case of normal prior of 
normal expected value when the pnor distnbution of /i has expected value 
fig and vanance Oq ( = **y) when the conditional distribution of 



3.2 PROPERTIES OF ESTIMATORS 


101 


X given ju has variance For n independent observation^ the posterior 
distribution of fi is normal with expected value (k^Q+ nX)/(k + n) and 
variance a^/(k + n). A Bayes interval with probability y and jr is 

kiiQ+nX 2(i+.y)/2a kiXp + nX (^(i+yya)*^ 

^ + ^ + " (k + n)^^^ 

where See Section 2.8.5. 

3.2.9 Minimizing Expected Cost 

Presumably, in a practical situation a precise knowledge of the parameter 
or parameters of the model of our system is useful. If we act on the basis of 
an estimate of the parameter we fail to attain this utility. In comparing 
estimators it is convenient to consider as a loss the decrease in utility 
owing to using an estimate t instead of the true value 0 of the parameter. 
The loss as a function of parameter and estimate is called a loss function ' 
and denoted £(^,0- Although in almost every case we cannot know the 
value taken on by t{9,t) since we do not know 6, the properties of the 
random variable where T is an estimator, may be amenable to 

investigation. The expected value of t{0,T) for given 6 is particularly 
interesting. It is called the risk and is denoted r(0, T). 

r(0,r) = £,(£(0,r)) 

= /” .../“ e[0,r(x„...,x„)]/(x„...,xJ0)dx,...dx„ (3.2.39) 

“ 00 — 00 


If we have to choose among estimators, we may be able to calculate 
r[6, T) for each estimator for values which are important to us and choose 
that estimator for which the pattern of r{9, T) looks best. 

If we have a prior distribution of 9, g(0)say, we can go further in 
selecting an estimator. For each estimator T, the random variable 
''is(9),T) has a distribution. We can look at the distributions of those 
estimators we are considering and choose the estimator whose distribution 
we like best. Many investigators seem content to choose that estimator for 
which the expected risk, E^[ri9,T)]= EgEg[e.(0,T)] is least. 

For the popular loss function t{9,t) = k{t — 0)^, minimum expected risk 
is attained by taking T as the posterior expected value of 9, that is, the 
expected value of the conditional distribution of 9 given the observations. 



102 


CHAPIXRJ INTRODUCTION TO STATISTICS 


33 CONFIDENCE INTERVALS 

If we have no prior distribution of our parameter, we may stilJ give, along 
with an estimate of the parameter, some indication of how far the estimate 
may be expected to be from the true parameter value In the following 
discussion. It IS impcutant to remembct the distinction between estimates 
and estimators An estimate is a number which is likely to differ from the 
numerical value of the parameter bemg estimated The estimator, the 
formula by which an estimate is calculated is a random variable and thus 
has a probability distribution The probability that it will take on a value, 
the estimate, within 0 02 of the parameter being estimated may be calcula- 
ble without knowledge of the value of the parameter 
We may know that the estimator has probability 95 of yielding an 
estimate within 0 03 of the value of the parameter If our point estimate is 
52 26, for example, we may report it as 52262:003 We do not thereby 
mean to imply that the parameter value is between 52 23 and 5229 but 
merely that the estimator we have used is such that the probability is 95 
that an interval constructed in this way will include the actual parameter 
value The interval itself may be called a confidence interval It is clear that 
if we had chosen a smaller probability than 95. the corresponding interval 
would m all probability have been smaller Larger probabilities go with 
larger intervals We must balance the advantage of being more definite 
against the advantage of being more sure 
The idea of confidence interval is developed in the following sections for 
particular cases but in a manner which it is hoped suggests how confi 
dence intervals may be constructed m other cases We start with a case in 
which the construction of the interval is simple and we can concentrate on 
the properties of confidence intervals We then move to more complex 
constructions and more practical assumptions 

33 1 Confidence Intervals for the Mean of a Normal Population when the 
Population Standard Deviation is Knonn 

Let os apply the pnnciples of the preceding section to find a confidence 
interval for the mean p of a normal population whose standard deviation is 
known Let {X,) be normal and mdependent, identically distributed (i i d ) 
with P(A■,) = (J^ a known constant, and let us use the distribution of A" as a 
basis for construction of a confidence interval We begin by noting that 
X-p 
o/Vn 

has a normal distribution with mean 0 and variance 1 (see (2 8 21)] 


(3 3 1) 



33 CONFIDENCE INTERVALS 


103 


Thus if 

we have 

p( + (3.3.3) 

\ Vn Vn ^ 

and 

P{x-z-^ <Vi<X+z-^] = y (3.3.4) 

V Vn vn ' 

and thus a y confidence interval for ft of a normal population with known 
variance a is the interval based on a sample of n 

where 

$(z)-«'(-z) = 2$(z)-l = y (3.3.6) 

Example 3.3.1 

Sixteen independent observations on normally distributed random variables with 
0 = 5 show x = 52.76. Find a 95% confidence interval for ja. 

Solution 

For y = .95, z=1.96 (see Table 2.13); hence the end points of the confidence 
interval are 

3c + 2 — = 52.76 ± 1 .96 — ^ = 52.26 ± 2.45 = (49.8 1 , 54.7 1 ) 

VT6 

The steps from (3.3.3) to (3.3.4) may be more meaningful if a graphical 
method of construction is considered. In Fig. 3.3 we picture the (/r,3c) 
plane. On the vertical line representing a particular value of ju, we plot the 
points representing the end points of the interval whose probability ap- 
pears in (3.3.3), g— z(o/Vn) and g-t-z(a/Vn). If we were to plot 
corresponding points for each possible g, the loci of these points would be 
the lowest and the uppermost sloping lines of the figure. The observed x is 


<zj = $(z)-$(-z) = y (3.3.2) 



Cmi^R 3 LNTRODUCriON TO STATISTICS 


104 



represented as a point (>t, t) Wc do not know n but we can draw a 
horizontal line at height x Although we do not know ft, we do know that 
(p,3c) will fall between the two 45" lines just constructed with probability 
y Viewing a as a random vanaWe, the probability that the honzontal line 
segment at height x between the two lines crosses the veriical line at is y 
The range of p represented by such a line segment for a specific x is called 
a confidence interval for p To avoid crowding in the figure, we picture a 
case in which the sample did not lead to a confidence interval which 
contained the parameter value 



33 CONFIDENCE INTERVALS 

105 

A confidence interval need not be symmetric. Indeed, 
random sample from a normal population with known a^, 

since for a 

p(x>ii z^) = $(z) 

(3.3.7) 


(3.3.8) 

is a <[>(z) confidence interval for the mean. So also is 



(3.3.9) 


One-sided confidence intervals should be used more often than they are. 
Still, two-sided intervals are usually wanted and we seldom, if ever, refer to 
one sided confidence intervals. 


3.3.2 Confidence Intervals for the Standard Deviation of a Normal 
Population 

Suppose (A",} are normal and independent with mean ji and variance 
To find a confidence interval for a we begin by noting that 




(3.3.10) 


has a distribution with v = n—l degrees of freedom. For y between 0 
and 1, we can find a pair (in fact, many pairs) for which 


xl< 


i:u,{x-x) 


<Xu 


= F^2{xl)-F^2{xl) = y (3.3.11) 


We solve the inequalities in the argument of the probability and obtain 

2 , 1/21 

<a<| -I !. = y (3.3.12) 


2, 1/2 


xl 




xi 


which defines a y confidence interval for the standard deviation of a 
normal population. 



106 


CHAP7BR3 INTRODUCTION TO STATOTICS 


If we wish the probability that the entire interval is above the population 
standard deviation to be equal to the probability that the entire interval is 
below the population standard deviation, we find xl and Xu solving 

( 3313 ) 

This interval is not the shortest possible interval for the specified confi 
dence coefficient 

Example 3JI.2 

For a sample of size 11 from a normal population, we have found 
2 (x,-x)*=23lM 
Find a 95% confidence interval for o 
Solution 

Since using (5 3 12) we see that 

IS a 95% confidence interval for a 

Figure 3 4 shows the graphical construction of confidence intervals for the 
standard deviation of a normal distribution using a sample of 10 (using nine 
degrees of freedom) For fixed o v/e lay off ordinates of o(x| end 

a(X» sii/9)'-^^ To get the confidence interval corresponding to t for nine degrees 
of freedom, draw a horizontal line at height s The values of a at the points of 
intersection with the two slanting lines give the end points of a 95% confidence 
interval 

The distribution of (n— l)s^/o* is 'f the sample is from independent 
normal distnbulions with the same mean and variance Unfortunately, the 
distribution of (n - I )s ^/ is sensitive to departures from normality m the 
population being sampled These confidence intervals must be used with 
discretion 

333 Confidence Intervals for the Mean of a Normal Population when the 
Population Standard Deviation is Unknown 

In sampling from a normal population with known variance, we developed 
confidence intervals for p based on 



(3 3 14 ) 



Sample standard deviation 


33 CONFIDENCE INTERVALS 


107 



Figure 3.4 Construction of confidence intervals for a given s from a normal distribution. 


If a is not known, we might consider using 

52= - • 

n — \ 

in its place. The resulting 



(3.3.15) 


(3.3.16) 


has a t distribution (see Section 2.8.9). The added variability introduced by 
S causes T to have a greater variance than Z, a variance which decreases 
to that of Z as n increases. 



CHAPTERS IVrRODUCnON TO STATISTICS 


Analogous to (3 3 3) we have 


</) = Y 

if 


2/'r(0-i = y 

Rearranging the inequalities we get 


and thus we see that 



IS a confidence interval with confidence coefficient y 
A gain, we easily get one sided confidence iniervals 


'A + r-^' 
VJi ‘ 


Of 



(3 317) 

(3 3 18) 

(3 3 19) 

(3 3 20) 

(3 3 21) 

(3 3 22) 


in each of which the confidence coefficient is F^f) It should be noted that 
although the symmclnc interval is relatively insensitive to departures from 
normality m the observations Ihe onesided mtenals are sensitive to 
skewness in that population 


ZA HYPOTHESIS TESTING 

If we have competing hypotheses about the correct model for an expert 
ment we need some means of dccidmg among the hypotheses We could 
reduce the problem to one of esUiaation by attaching an index number to 
each hypotheses Our problem would then be one of estimating the index 
number of the true hypothesis However, there are advantages to taking a 
fresh point of view, especially when the choice is between two hypotheses 



3.4 HYPOTHESIS TESTING 


109 


In Section 3.4.1, we consider a choice between two simple hypotheses, 
that is, hypotheses which specify completely all parameters. In this case it 
is possible to develop a method of attack leading to procedures which 
simultaneously satisfy a wide variety of criteria for good decision proce- 
dures. In Section 3.4.2 we see that some decisions between compound 
hypotheses can be treated as decisions between simple hypotheses. 

In Section 3.4.3 we broaden our scope to include other compound 
hypotheses. We shall find it convenient to consider these problems as 
requiring decisions as to whether our sample came from a distribution 
which is a member of a particular subclass of all the distributions we admit 
as possibilities. Examples of such problems are (1) deciding whether our 
sample came from a normal distribution with known mean and unknown 
variance or from some other normal distribution, and (2) deciding whether 
a regression curve is first or second degree. 

The structure of a decision making procedure should depend on the 
costs of gathering data and of making wrong decisions. To a first ap- 
proximation, the expected costs of wrong decisions may be taken to 
depend only on the probabilities of wrong decisions and the cost of 
gathering data to depend only on sample size. We consider only sampling 
plans for which the sample size is fixed before sampling begins. 

3.4.1 Two Simple Hypotheses 

A simple hypothesis is one in which the model is completely specified (with 
no unknown parameters). Consider a case in which we are to choose 
between two simple hypotheses. We can find the distribution of possible 
samples for each hypothesis. If the cost of choosing the wrong hypothesis 
depends only on which hypothesis is true, the development of a test falls 
neatly into two parts. First, we can order the possible samples with respect 
to the degree to which each favors one of the hypotheses, H 2 , say, over the 
other. Hi, say. Secondly, we determine a critical point, a point in the 
ordering such that, if our sample is on one side of this point we choose to 
consider /f, as true and if on the other we choose to consider as true. 
(For those samples which have the same position in the ordering as the 
critical point, we may choose to include them with one side or the other, or 
divide them between the two sides.) 

The ordering of possible samples can be accomplished by computing, for 
each possible sample, the ratio of (1) the probability of that sample if H 2 
were true to (2) the probability of that sample if JT, were true: 

P{Xi,...,X„\H2) 

P{xi,..„x„\Hi) 


(3.4.1) 


no 


aiAPTER 3 INTRODUCTION TO STATISTICS 


or if we are dealing with continuous random variables and therefore with 
density function, we compute the ratio 


fx, X,\H, (*i» 

A, X^H, .*,) 


(3 4 2) 


If the observations are independent, both the numerator and denominator 
can be written as products of factors involving the individual observations 
The samples are ordered from those most favorable to /f, (with ratio 
closest to 0), to those most favorable to (the largest possible ratio) To 
decide where in the ordering lo make the division between those which 
cause us to accept one hypothesis and those which cause us to accept the 
other, we compute for each hypothesis and for each possible point of 
division the probability of making a wrong decision Changing the point of 
division cannot simultaneously decrease both the probability of deciding 
H, in case Hj is true and the probability of deciding m case /f, is true 
Among the pairs of probabilities, we choose the pair most acceptable to us 


Example 3.4 I 

Let both hypotheses specify independent normal observations with 0 -I 6 Let 
n“25 Let ffi specify (i« to ftod Wjspecify 1$ Find a decision procedure if the 
probability of deciding Hi when is true is 10 be twice ihe probability of deciding 
Hi is true when Hi 1 $ true 


Solution 
Under ff, 

and under Hi 

Let us find the : 

fx^ ATJ/f, (-»^1' 
fx, X.IK 1 



043) 


(3 4 4) 

1 given by (34 2) 



„) I6'’(2ir)"* 






I6*(2ir)''" I ^ •'5'’ 

= exp^- jj2(Zx?-30X*,+ 25 225-£x?+202x, 


25 I00)j 
25 125 \ 


04 5) 



3.4 HYPOTHESIS TESTING 


111 


As I.Xi or X increases, the ratio increases. Hence the ordering is in accord with the 
values of x. Our decision rule will be: decide H 2 if x>k where k is still to be 
determined. To find k we use 

= ) (3.4.6) 

= (3.4.7) 

By trial and error, we find the k which makes 

to be /:= 11.68. 

3.4.2 Problems Reducible to Problems of Two Simple Hypotheses 
Example 3.4.2 

As a purchaser of electric light bulbs, we wish to make reasonably sure, by 
sampling inspection, that the average life of the bulbs in each consignment is not 
too low. We agree with the manufacturer on an accelerated life test. We believe 
that the distribution of life under the accelerated test will, for production under 
stable conditions, be sufficiently close to normal that we can safely assume 
normality in our analysis. We believe that the standard deviation of results of our 
accelerated life test will be very close to 200 hr. We wish a good chance, .95, of 
rejecting a consignment if the expected life is as low as 1000 hr and a reasonably 
good chance, .90, of accepting an expected value as good as 1100 hr. How large a 
sample is needed and what values of x will cause us to reject a consignment? 

Solution 

We shall reject a consignment if x, for a sample of size n is less then k, where n and 
k are to be found. The constraints we have introduced are 

P(x<fc|fi<1000)>.95 and P(3c > fcj g > 1100) > .90 

which is equivalent to 

P(x<fc|/t=1000) = .95 and i’(ic> /c|ja= 1100) = .90 

Using (2.8.21) we see that these are equivalent to 

d/'x-IOOO,/- ^A:-1000,/-\ _ 

p/x-llOO,^ ^ A:-1100,^A _ 


and 



CHAPTERS INTRODCCnON TO STATISTICS 


in 

or 

and i=^V;--US2 

Solving (or n and k we find that we should use a sample of size 3S and re3ect the 
consignment if XK 1056 

We have just seen how a problem m test construction may sometimes be 
solved by methods designed to construct decision procedures for deciding 
between two simple hypotheses 


3 43 Generalized Likelthood Ratio Tests Power 


For tests of hypothesis which do not reduce to the case of choice between 
two simple hypotheses we descnbe only one type of test the generalized 
likelihood ratio test Suppose that observations come from one of a (broad) 
class of distributions and we want to test the hypothesis called the null 
hypothesis and symbolized H(, that the observations come from a distnbu- 
tion belonging to a particular subclass We form a likelihood ratio \ For 
the numerator, we use the maximum of the likelihood over all distnbutions 
belonging to the subclass For (he denominator we use (he maximum of 
the likelihood over all distributions of the (broad) class 


X- 


maxL 

maxX 


(3 49) 


The ratio must be between zero and one the smaller the ratio the less we 
are inclined to accept the null hypothesis The decision rule will be reject 
Ho if X <Xo where Xq is determined so that 

F(X<Ao|Ho) = o (34 10) 


where a is the level of significance of the test 


Example 3 43 

Assuming that we have n observations from a normal distribution test the 
hypothesis that p= po using a 5% level of significance 

Solution 

The broad class we are considering is that of all normal distnbutions The subclass 
IS that of all normal distnbutions with either case the likelihood function 



3.4 HYPOTHESIS TESTING 


113 


has the form 


L = 


(2w) a 


\ 2a2,= i 


mX 


(3.4.11) 


L is maximized in the (broad) class by replacing ft by x and by [2,'Li(x,- 
xf]/n, giving us 


maxL = 


(277) 


n/2 


E%,(x,-x)2i 


n/2' 


,-«/2 


(3.4.12) 


L is maximized under the null hypothesis, (i.e., in the subclass), by replacing ft by 
Ho and by [2”=,(x. - fiof]/fi giving us 


max L = 
Ha 


,-n/2 


(277) 


n/2 


27=,(x,-po) 


2 _ 


rt/Z 


(3.4.13) 


The likelihood ratio, X, is given by 


\ = 


2?=i(x,-^to)^ 


n/2 


(3.4.14) 


Since 2"=i(x, — gg)^ of the last form can be written 


2 U-f^o) = 2 (^.-^) +”(^- 1 ^ 0 )^ 
1=1 (=i 


(3.4.15) 


we can write 


AV" = 


27 =,(x,- 2 c)^ 


2"=i(x,-x) +/i(x-/io) 

1 


1 + 


1 


n(x-fto) 


n-1 


[27=,(x,-x)^]/(«-l) 


= TT?^n-l) (3.4.16) 

where f is the familiar t of Section 2.8.9. Since small values of X correspond to large 
values of the decision rule will read: reject Ho if |r| > to where tg is the solution 



114 


aiAPT£K3 INTRODUCTION TO STATISTICS 


of fr,_] df(^o)“^~(“/2) If «=11. wc have ID degrees of freedom Entering 
Table 2 15 with « = lO./'rfO' 975. we find /o=2 23 

The power of the test to detect deviations from the null hypothesis is 
defined to be the probability of rejecting the null hypothesis as a function 
of the specific distnbutions to the broad class of alternatives 


REFERENCE 

I Box,G E. P and Tiao, G C Payetian Infereme tn Statuucol Anatyns, AHisoo'ftttie'j 
Publishing Co. Reading, Mass, 1973 


PROBLEMS 


3 I For the following, (atmly of probability distribution*, find all possible sam 
pies of 3 (with replacement), find the mean, the median, and the probability 
of each.and find P((,f-^l<lW-^|)and P(!a'-^! >|A/- fl|) when X\s 
the sample mean and M is the sample median 

* 8-2 0 -\ 9 8+1 8+2 

P(X-*) 1 2 4 2 1 

Ansner Working through this problem may be somewhat tedious but certain 
short cuts are possible There are 35 possible samples if order of observations is not 
considered The different possible orders of observaiions for each possible sample 
muse be taken into account m calculaung probabilities You will find chat 

P(15-8)<)A/-8i)’= 294 and P(ilt/-et<lii-8|)- 408 


3 2. A sample of 10 independeot observauons on a random variable is given 
below Using the usual unbiased estimators find estimates of the expected 
value and of the variance of (he population sampled 
1372, 1384 136 8, 1375, 137 4. 137 2. 137 9 1369 1374, 1376 


33. 

34 


Is the average of n independent observations an unbiased estimator of 8 if 

the density function for eadi X is/ ® 0<x<oo 

I 0 otherwise 

If X has a binomial distribution with parameters n and p, show that 



nit, — n\ . 

I^Remember £(Y*)- 





PROBLEMS 


115 


3.5 For the random variable Y,P{Y=y) = e-'^\y /y \ find an unbiased estimator 
of based on one observation. 

Answer. Y(Y-l). 

3.6 An estimator of the least upper bound of possible values of a given random 
variable is the maximum of a sample of independent observations. If the 
random variable has a rectangular distribution with density 0 “ ' over the 
interval {0,9) and if we use sample size n, (1) find an unbiased estimator of 6 
of the form k times the maximum observation, (2) find an unbiased estima- 
tor of 0 of the form k times the average of the observations, (3) find the 
relative efficiencies of the two estimators. 

3.7 For the random variable Y distributed uniformly over (0, 0), is 
max(yi,...,y„) an unbiased estimator of 0? 

3.8 Find the minimum possible variance of an unbiased estimator of p when 
P{Y=y) = [’^y)py(l-p)'‘-^. 

Answer. p{l—p)/n. 

3.9 For a binomial random variable generated by n independent trials, each with 
probability p, show that x: = number of successes/number of trials is a 
sufficient statistic. 

3.10 If 

x>0 

< 0 otherwise 

find the maximum likelihood estimator of 9 given a sample of n independent 
observations. 

3.11 Find the maximum likelihood estimator of X from a sample of n independent 
observations from the Poisson probability function P{Y—y) = e~^Xy /y\. 

3.12 Find the maximum likelihood estimator of 0 from a sample of n independent 
observations from the uniform distribution over the interval (— 9,9). 

Answer. max{|Z,l,lA 2 |„...,|A„|}. 

3.13 If is a random variable with density function 6p{\-p) for 0</)<l 
and 0 otherwise, and if A is a binomial random variable with parameters 
n and p, (1) find the posterior distribution of p for a given value of x. 
(2) Find the expected value of the posterior distribution, and (3) find the 
maximum posterior likelihood estimator. [Remember J J^'"“'(l — 

= r(w)r(7i)/r(m + «).] 

Answer, (partial). (2) {x + 2)/{n + 4); (3) {x + l)/(/j + 2). 

3.14 Show that the likelihood ratio test of the hypothesis that X = Xq where X is the 


116 


CHAPTERS INTRODUCTION TO STATISTICS 


parameter of the exponential distnbution with density function 

i ,-/* ,>0 

0 otherwise 

will have a igection region which is the solution of an inequality of the form 
(x/\o)e’tp(- constant 

(a) find r[p.pi(x)] for/,(*)«(je+l)/3, 

(b) find/-l/>./o(Jt)ifOf/t»(Jc)“5 

(c) H 2 (p)= 2(1 'P),0< 1, find E )} 

3 16 ff Af has a binomial distribution with «“4, if p = (jr + l)/6. and if t{p.p)= 
{P~P)^, find r{p,p) 

Answer. ^ 

3 17 From a tiortnal distribution with standard deviation 5 a sample of 16 
independent observations was obiained x was calculated and found to be 
31 S Find a 90% confidence interval for >i 
Answer. (29 44.33 56) 

318 From a normal distribution a sample of 16 was obtained 7 and s were 
calculated and found to be 31 5 and 5. respectively Find a 90% confidence 
interval for p 
Answer (29 31, 33 68) 

3.19 From a normal distribution with standard deviation 5 a sample of 16 
independent observations was obtained X was calculated and found to be 
31 5 Test at the 5% level of sigiuficance the hypothesis that fi=30 against 
alternatives that fi?t30 

3.20 If + using one observation, a re- 

jection region a: > 9 was decided upon Find the power function of this test 

Answer. I + 045<1^ 



CHAPTER 


4 


E^rameter 

ESTIMATION METHODS 


4.1 INTRODUCTION 

In this chapter we introduce some of the concepts which we shall develop 
in the remainder of the book. Some canonical forms for the models of 
problems of parameter estimation are presented, least squares estimators 
are described, and modifications suggested by various criteria for good 
estimators are mentioned. We close the chapter with a short discussion of 
simulation techniques for comparing methods of estimation. 


4.2 RELATIONS BETWEEN OBSERVED RANDOM VARIABLES AND ; 

PARAMETERS 

We assume a functional relationship among several measurable variables, 
(7,A',,...,A'^), one or more parameters (/?,,y 82 , ...,j 8 p), and particular val- 
ues of one or more random variables (In this book we deal 

almost exclusively with cases in which each observation involves only one 
random variable; thus we have no need for the index indicated by the 
superscript.) The measurement of the measurable variables provides the 
observations. The parameters are unknown and we wish to estimate at 
least some of them. The particular values of the random variables e’s are 


117 



118 


aiAFIXR4 PARAMETER ESTIMATION METHODS 


unknown We may estimate the e’s themselves in order to get a picture of 
how well the estimates fit with our preconception of the distribution of the 

£‘S 

It is convenient to pick one of the measurable variables and express this 
variable m terms of the others It is the picked variable with which we 
associate the pronoun Y Thus wc wnte 


y^^nx,. .0„e) (42 1) 

The variable to be called Y is traditionally chosen because we are inter- 
ested in how us value is affected by the values assumed by X,, .A"* It is 

traditionally called the dependent variable The variables X,, ,X^ are 

called the independent variables (Be clear in noting that independent m 
this context does not mean independence in the statistical sense ) The X’s 
may be thought of as the causes of the Y. as when Y represents the yield in 
a chemical process into which amounts X, .X^ of matcnal from sources 
1, ,k are combined The X’s may meteiy describe the physical environ- 
ment, as when Y represents temperature at point (Xj.Xj Xj) in space at 
time X 4 

We are fortunate if the e's can be combined into one e and especially 
fortunate if the errors are addiiue, that is if we can write 

r=(X,. X* 0, + e (422) 

tti which the distnbution of « docs not depend on the unknown 
although It may depend on parameters which do not appear m the other 
term of (4 2 2) It will sometimes be convenient to index the 0’s by integers 
beginning with 0 , thus 00 0j convenient to use 

vector notation to abbreviate, thus 

X=(X,. X*) 

0 = ( 0 , , 0 f) or (00,01 ^,^,)asappropnate 

and 


y*i,(Xp)+£ (423) 

Thus the Uh observation will be signified by adding a subscript i to 
(T,X,. .X*), that is, y;,X,,, ,X^ ^ u found by (4 2 2) or (4 2 3) to be 

such that 

y,*r,(X„. ,^»,)+e.'=n(X.,P) + e,=ti, + c, (424) 



4.2 OBSERVED RANDOM VARIABLES AND PARAMETERS 


119 


to introduce in the last relation one further abbreviation. And yet one 
more abbreviation is to use Y for ( Yj, . . . , ¥„), s for (ep . . . , e„), and X for the 
matrix 




^.1 

Xy2 

• 


X2^ •• 

• ^2fc 





(4.2.5) 


Note that the X,’s are column vectors. 

The distribution of the e’s is generally unknown. If the e/s are correlated, 
estimation of parameters may be much more difficult, the estimators less 
reliable, and the reliability difficult to assess. We therefore deal first with 
cases of independent e,’s, later investigating estimation under various 
assumptions regarding correlation. For those methods of estimation which 
require some assumption about the form of the distribution of the e/s or 
when the evaluation of the method requires assumptions of the form, we 
shall invariably investigate first under the assumption of normality. Usu- 
ally we go no further. Fortunately many aspects of the relationships 
between sample moments and moments of the random variable being 
investigated do not depend on details of the form of the distribution. 

One trouble arises from too great a preoccupation with cases in which 
the e’s may be assumed to be normally distributed. Different criteria for 
evaluating estimators may be expected to lead to different choices of 
estimators to be used. We seem frequently to deal with criteria which in 
general do suggest somewhat different estimators but which, when applied 
to a case in which normal e’s are assumed, lead to identical estimators. The 
casual student is sometimes misled to assume that if the estimators are the 
same the criteria must be essentially equivalent. Beware. We have used and 
shall use assumptions other than normality not only to illuminate the 
difference but also to give insight into the effect of wrongly assuming 
normality. 

To estimate parameters we must first gather data and then analyze them. 
It is essential that experimental procedures be such that the data can be 
analyzed. The form of the function to be used in (4.2.1) and the experimen- 
tal procedure must be developed together. “Design of experiments” deals, 
for the most part, with choosing values of the Y’s to facilitate analysis and 
to improve accuracy of estimation. 



120 


CHAPTER 4 PARAMETER ESTTVUTION METHODS 


43 EXPECTED VALUES, VARIANCES, COVARIANCES 

With a sequence of random variables, ,e„, we associate expected 
values £{e,), .£(0- variances K(t,) ,1^(0 and covariances 

cov(e,.£2), covfsi.tj), ,cov<f,_,,e,) In Chapter 6 we use vector and 
matrix forms to save space and to increase claritv 

.E(,.)] (43 1) 

K(e,) cov(r,.«j) cov(e,.£,) 

cav(t,.fj) K(cj) cov(ej.eJ 

cov(c)= (43 2) 

cov(f,.r,) cov(fj f,) V{e„) 

In cov(8) it IS convenient to think of each covariance as occupying two 
positions symmetncally situated with respect to the mam diagonal 

4 4 LINEAR PROBLEMS 

If we can write t)(X, 0) in the form 

+ (44 1) 

T=^,X, + ^iXi+ + e (442) 

we find the problem of estimating ^ simpler than otherwise At the same 
time, if ij IS not linear in its parameters, we may find a form such as (4 4 1) 
a useful approximation to ■»>(X in the neighborhood of some particular 
value of X and 0 Chapters 5 luid 6 de^ with linear estimation 

43 LEAST SQUARES 

In forms (422) or (4 23) or (4 4 2) we shall be interested in £(T|X,) It is 
£(r|X,)=n(X„/l) + £-(i^) (45 1) 

If £(8,) = p, we can easily rewntc 

£■ ( m) = I ’I (X..P ) + 1*] + 


(4 5 2) 



4.6 GAUSS-MARKOV ESTIMATION 


121 


We see that (4.5.2) has the same form as (4.5.1) with 'rj(X,,/3) of (4.5.1) 
replaced by 7)(X„/3) + ja of (4.5.2) and e, replaced by a new random 
variable e, — ja. If jii is known, it is just a number in (4.5.2). If ju, is unknown 
it plays the role of one of the P’s. We shall lose nothing and gain simplicity 
if we assume e, has expected value 0. We deal further with this question in 
Section 5.10. 

If t](X,/ 3) is a constant, say, T)(X,j3) = ja for all X, the proWem of 
estimation is one we handled in Chapter 3. Our estimate of /r is F and if 
the e’s are normally distributed Y is the best estimator of ja from many 
points of view. In looking for some property of Y which is at the same time 
simple and generalizable, mathematicians a couple of centuries ago turned 
to the fact that, if we have a set of numbers Y^,Y 2 ,-.., Y^, Y is the value of 
the variable /a which minimizes | ( F, — fif. As estimator of /3 of (4.2.2) 
they chose that J3 that minimizes 

5=|lF-r,(X,/3)|p= i fF,-r,(X„p)f (4.5.3) 

with respect to changes in j3. This method of estimation is known as the 
ordinary least squares method. For (4.4.1) or (4.4.2) the computation of the 
least squares estimates of /3 can be described in a straightforward manner 
without using successive approximations or iterations. If jS is unique, it is 
called the least squares estimator of fi. 

4.6 GAUSS-MARKOV ESTIMATION 

If in (4.2.2) or (4.4.2) the e’s are independent but do not all have the same 
variance and if we know the proportion • ' ' • • we would almost 
certainly wish to consider in place of i ( F, — t/,)^, 

n (Y 

S(P)- S — (4.6.1) 
1=1 u, 

in order that more accurate measurements be counted more heavily. The 
minimization of S with respect to P does not depend on the size of any 
but only on their proportion. In Chapter 5 we expand on this idea. 

If the e’s are not independent, the weighting that suggests itself involves 
the covariances among the e’s as well as the variances. We shall deal with 
some such cases in later chapters. The sum of squares to be minimized is 



122 


CKAPTCR 4 PARAMETER ESTIMATION METHODS 


where the Wy% are elements of the inverse of the vanance-covanance 
matriA of the e’s See Section 617, Ganss-Maikov theorem 


4 7 SOME OTHER ESTIMATORS 

If we know the form of the joint distribution of the e/s and therefore of the 
y's, we can seek joint maximum likelihood estimators of the We may 
sometimes need to estimate parameters of the distribution of the e/s in the 
process Maximum likelihood estimators are discussed in Chapters 5 and 6 
under various sets of assumptions about (he distribution of the e’s We 
shall find that, if the e/s are normally independently distributed with zero 
mean and common variance, the maximum likelihood estimators turn out 
to be the ordinary least squares estimators If the e/s have a multivanate 
normal distribution not necessarily with equal variances or zero covan- 
ances, but possessing known proportions among the variances and covan* 
ances the maximum likelihood estimators are generalizations of least 
squares estimates 

If the conditional distribution of Y^ E, given /J, can be de* 
senbed by a probability density function (or by a discrete probability 
function) which has continuous second partial derivatives with respect to 
each p, and jointly with respect to each pair then the first partial 
derivatives will be zero at the maKimum hketihood estimate bi„t. of P that 

IS, 


Zf(y\p) ^ ait)/(Y!p) 
9)3 ip 


fori»l 2 


(4 7 1) 


If the form of the distnbution of the random variable e is known and it 
IS known that the paranieter(s) (o for example) of the distribution of t are 
chosen in accordance with a known probability distribution and if we 
know we can adequately approximate the prior distribution of the parame- 
ters, we can use Bayes’s theorem to obtain the posterior disinboiion, the 
MAP estimators and the squared error loss estimators 
The MAP estimates are found by maximizing f(P)V) If there is a 
unique ^ which maximizes /OIY), that is, if there is a unique mode, the 
MAP estimate of P is the mode of the posterior distribution of p If the 
distribution of p is desenbed by a probability density function which has 
continuous second partial denvatives with respect to each p and jointly 
with respect to each pair of /J/s, the first partial derivative will be zero at 



123 


4.'7 SOME OTHER ESTIMATORS 


the mode, that is, 




9A- 


= 0 for 1,2,...,/? 


(4.7.2) 


^MAP 


The estimator which minimizes the expected value of the square of the 
deviation of the estimator from the parameter being estimated 
squared error loss Bayes estimator and we symbolize it by For scala 
0, if the expected value of the posterior distribution of ^ given Y exists, it 

^SEL’ 


6sel= r 

— oo 

Another possible estimator is the median of the posterior distribution. 
This estimator is associated with minimizing the expected absolute devia- 
tion of estimator from estimated. 

If the posterior distribution of /3 is symmetric the median and mean 
coincide. Some symmetric densities for scalar P are shown in Fig. 4.1. or 
such cases the bsei vector defined by (4,7.3) is given by the mean or 



Median 

Mode 




Figure 4.1 Some symmetric conditional probability densities. 



124 


CHAPTER 4 PARAMETER ESTIMATIOS METHODS 


median value of the conditional distnbution of ^ given Y. /(^[Y) If the 
density /(/?jY) in addition to being symmetric is also unimodal, the mean, 
median, and mode will all be at the same location Hence when /(iS(Y) is 
symmetric about the parameter vector {i and is also unimodal, 65 ^^ 's 
f>MAp When the distribution is nal symmetric or no! unimodal 

are rarely the same Some nonsymmetnc unimodal probability densi 
ties are depicted in Fig 4 2 Note that the inodes do not coincide with the 
means This causes the parameters 6 s£l given by (4 7 3) and associated 
with the mean to be not equivalent to those given by the mode which are 
indicated by (4 7 2) 

The conditional probability density /(p)Y) used in (47 2) can be wnuen 
in terms of other densities using the form of Bayes's theorem written as 






(4 74) 


The probability density/(^) contains the pnor information known regard* 
mg the parameter vector (i Notice that the parameters appear only m the 
numerator of the right side of (4 7 4) this numerator can also be wntten as 

(4 7 5) 

Then the necessary conditions given by (4 7 2) can be written equivalently 
as 


31n[/(Y.P)]| 9I„[/(Y|(1)]| 

— M — L ■ 

Since the maximum of /(Y P) exists at the same location as the maximum 
of Its natural logarithm 



Figure 4 2 Some ncmsynmetnc conditioikat piobabiUi^ densiues 


4.9 MONTE CARLO METHODS 


125 


The estimators b^AP.bsEL described without reference to the linearity 
or nonlinearity of the expected value of Y in the ^’s nor to the indepen- 
dence of the 7,.’s. Under some assumptions about the structure of 17, and 
under some assumptions about the prior distribution of the ^’s, the MAP 
and SEL procedures are equivalent in arithmetic to certain least squares or 
Gauss-Markov procedures. 


4.8 COST 

Methods of collecting data and analyzing them must be coordinated. If 
observations are expensive, sophisticated methods of analysis to extract all 
pertinent information are justified. Sometimes more expensive methods of 
collecting data yield net returns by drastically reducing the cost of analy- 
sis. Increased costs due to collecting more data or using more sophisticated 
methods of analysis may or may not reduce the cost occasioned by the 
degree to which the estimate is incorrect. Some remarks in Chapter 3 were 
directed to these matters. 


4.9 MONTE CARLO METHODS 

One method for investigating the effects of nonlinearity or various other 
effects that are difficult to analyze otherwise is called the Monte Carlo 
method. Actually, what we describe is sometimes referred to as the “crude” 
Monte Carlo method. More sophisticated Monte Carlo methods often 
provide the same amount of information as the crude method but at a 
lower cost [1]. 

The Monte Carlo method can be used to investigate analytically the 
properties of a proposed estimation method. To simulate a series of 
experiments on the computer we proceed as follows: 

1. Define the system by prescribing (a) the model equation, also called 
regression function, (b) the way in which “errors” are incorporated in 
the model of the observations, (c) the probability distribution of all the 
errors and, where applicable, (d) a prior distribution. Assign “true” 
values to all the parameters (/3) in the regression function and to those 
in the distribution of error. 

2. Select a set of values of the independent variables. Then calculate the 
associated set of “true” values of tj from the regression equations. 

3. Use the computer to produce a set of errors c drawn from the 
prescribed probability distribution. For most computers programs are 



126 


CHAPTER 4 PARAMETER ESTIMATION METHODS 


available which can generate a stream of numbers that have all the 
important characteristics of successive independent observations on a 
population uniform over the interval (0, 1) Since they are generated by 
a deterministic scheme, they arc not actually random Such numbers 
are called pseudorandom numbers Suitable transformations are used to 
obtain samples for any other distribution 
To obtain a sequence of pseudorandom observations on a normal 
population with expected value 0 and variance 1, we can make use of 
the Box-Muller transformation [2] If and Mj, are independent 
(0, 1) random numbers 

X2,_i*(-21nu2,_,)*^*cos{2ffW2,) (49 la) 

and 

Jrj,=*(-21nw2, |)''^*sin(2ffi/2,) (49 lb) 

are independent random observations on a nonnal distribution with 
expected value 0 and variance I The normal random numbers are 
then adjusted to have the desired variances and covanances 
The simulated measurements are obtained by combining the errors 
with the regression values For additive errors the ith error is simply 
added to the ith ii value This then provides simulated measurements 

4 Acting as though the parameters are unknown, we estimate the param 
eters, denoting the estimates P' 

5. Replicate the series of simulated expenments N times by repeating 
steps 3 and 4, each time with a new set of errors 

6. We use appropriate methods to esliraaie properties of the distribution 
of parameter estimates (We consider the estimates actually obtained 
by our pseudorandom number scheme to be a random sample from 
the dislnbution of all possible estimates ) The expected value of our 
parameter estimator » estimated by the mean of our parameter esti- 
mates. 


( 492 ) 

where 0* is the yth component of the 0* found on the ith replication 
If may be a biased estimator, P*-p is an estimate of the bias If it 
IS not clear whether or not p* is biased the size of p*-p needs to be 
compared with an estimate of its vanance-covanance matrix 
The variances and covariances of the distnbution of 0* may be 



4.9 MONTE CARLO METHODS 


127 


estimated by 

est. cov( ) - 7 ;^ S ( /j; - ^* )( ffl - W ) 

If |3* is known to be unbiased, we can make use of our knowledge of /3 
and use a slightly more efficient estimator 

est.cov(/3;,/3,*)=-^ S (4-9.3b) 

If /3* IS biased, the right side of (4.9.3b) which are estimates of mean 
square error and corresponding product moments, may be more inter- 
esting than variances and covariances. If we use actual experiments 
rather than simulated ones (4.9.3b) will be not available although 
(4.9.2) and (4.9.3a) are. 

The flexibility of the above simulation procedure is great. We can 
estimate the sample properties for any model, linear or nonlinear, and for 
any parameter values. We can estimate the effect of different probability 
distributions upon ordinary least squares estimation or other estimation 
methods. Many other possibilities also exist. An example of a Monte Carlo 
simulation is given below and another one is given in Section 6.9. These 
simulations can be accomplished on a modern high-speed computer at a 
small fraction of the cost, in time and money, of a comparable set of 
physical experiments. 

The great power of the Monte Carlo procedure is that we can investigate 
the properties of estimators in cases for which the character of the 
estimators cannot be derived. To demonstrate the validity of a Monte 
Carlo procedure an example is considered which is simple enough to be 
analyzed without recourse to simulation. We investigate estimating P in the 
model Tj, = /?A, for the case of additive, zero mean, constant variance, 
uncorrelated errors; that is 


y, = '!), + £„ E{e,) = 0, F(e,) = CT^ £(e,e^) = 0 for/=A7 

The distribution of e, is uniform in the interval ( — .5, .5); each e, is found 
using a pseudorandom number generator. There are no errors in A, and 
there is no prior information. 

The X, values are X, = i for /=1,2,...,10 and 13= 1. For the A:th set of 
simulated measurements, f3^ is found using the ordinary least squares 



1^8 


CHAPTER 4 PARAMETER ESTTVUTION METHODS 


estimator. 


The estimated expected value of Pl, (4 9 2), the estimated vanance of 
(4 9 3a), and the estimated mean square error of ;S* , (4 9 3b), are obtained 
by using 


- -j^ J ^ K. ts, nil')- 1 ( p; - P'f 

est mean square error ( >5*) =-j|- 21 (yS*-l)^ 

For independent sets of errors, estimates were calculated for 25, 50 
100, 200, and 500 The results are shown m Table 4 1 where the estimated 
standard deviation and estimated root mean square error are given rather 
than their squares In Table 4 2 comparable results for a simulation 
involving normal errors are given The vanance of e, in this case was taken 
as 1/12, the same as the vanance lor the uniform jrase 
In both Tables 4 l and 4 2 the sample mean /?• tends to approach the 
ture value of 1 as A' becomes large Hence /3* is an unbiased estimator of 
13 Also the estimated standard error of f3’ and estimated root mean square 
error tend to their common exact value 



Table4.I Monte Carlo Simufalion for j},=» 48 Ar,, with 1 andA'. = /, 
1 ^ 1 , 2 , ,10 Uniforni Distribution of Errors 


Sample 

Size 


Est Std Dev 

iP*) 

Est Root Mean 
Square Error 


10044 

000950 

0 00958 


10014 

001614 

0 01589 


09992 

001330 

0 01339 


vym 




10018 

001440 

001448 

500 

0 9987 

00I4I3 

00I4I9 



REFERENCES 


129 


Table 4.2 Monte Carlo Simulation for tj, = X,, with p = 1 and X. = i, 
/ = 1 , 2, . . . , 1 0. Normal Distribution of Errors 


Sample 

Size 

P* 

Est. Std Dev 

(iS*) 

Est. Root Mean 
Square Error 

iP*) 

5 

1.0021 

0.01156 

0.01055 

25 

0.9969 

0.01608 

0.01606 

50 

0.9972 

0.01496 

0.01507 

100 

0.9973 

0.01486 

0.01502 

200 

0.9995 

0.01410 

0.01407 

500 

0.9997 

0.01480 

0.01478 


This example shows that the number of simulations N must be quite large 
in order to provide accurate estimates of the variance of the parameter 
estimate. Such simulations are still inexpensive compared to actual experi- 
ments to determine the variance. Moreover, methods are available for 
making the simulation procedure more efficient [ 1 ]. 


REFERENCES 

1. Hammersley, J. M. and Handscomb, D. C., Monte Carlo Methods, Methuen & Co. Ltd., 
London, 1964. 

2. Box, G. E. P. and Muller, M. E., “A Note on the Generation of Random Normal 
Deviates," Ann. Math. Stat., 29 (1958), 610-611. 



CHAPTER 


5 

Introduction to 

LINEAR estimation 


5.1 motivation, MODELS, AND ASSUMPTIONS 

5.1.1 Motivation 

One o! the basic pnticiples m engineering is to start analysis with simple 
cases For that reason estimation of parameters in several simple linear 
algebraic models is studied tn this chapter Many of the estimation ideas 
can be introduced in connection with these models without the added 
complexities introduced by nonlinear algebraic models or by models 
described by differential equaUons 

In addition Co the pedagogic value of simple algebraic cases, there are 
numerous physical situations for which the regression function is linear in 
the parameters Moreover, when the regression function is unknown and 
cannot be derived fiom first principles, simple models are usually pro- 
posed 

Simple linear models have been widely studied by statisticians, 
economists, and others Various terms designating certain parts of the 
study of estimation of parameters in statistical models have also been used 
to refer to much larger segments of ^t study When the models are finear 
in the parameters, regression analysis and analysts of variance are some- 
times used interchangeably However, regression analysis also specifically 
refers to the analysis of the dependence of the expected value of a random 



5.1 MOTIVATION, MODELS, AND ASSUMPTIONS 


131 


variable on the conditions* under which the experiment is conducted; the 
method of least squares is frequently used to estimate the parameters. 
Analysis of variance refers to the breakdown of the variability of the 
observed values of the dependent variable into a part which is the sum of 
squares about the fitted regression function and other parts due to the 
exclusion of parameters or groups of parameters from the regression 
function. Those using analysis of variance methods when the independent 
variables are limited in possible values to 0 and 1 (presence or absence) 
tend to be unaware that a model is implied [1, p. 243]. Analysis of 
covariance uses a combination of techniques which are specially adapted to 
0 or 1 independent variables and techniques needed in more general cases. 

5.1.2 Models 

Certain aspects of models are discussed in this section. First considered is 
the model functional form, which is termed the regression function. Some 
restrictions on designs for these functions are also given. Second, two error 
models are discussed. In one there are measurement errors and in the 
other the random component is in the equation describing the system. 
Third, in the next subsection various standard assumptions relating to the 
statistics of the errors are given. 

The regression functions for the cases used are considered to have the 
correct functional forms, that is, not empirical approximations or best 
guesses. The functions considered in this chapter are linear in the parame- 
ters and contain at most two parameters. For convenience in later refer- 
ences, the regression functions used in this chapter are listed and labeled as 


follows: 

Model 1, Tjj—ySo (5.1.1a) 

Model 2, = (5.1.1b) 

Models, = (5.1.1c) 

ModeU, 7j,. = /?' + /3,(A,-A); 2 ^ (5.1. Id) 

Model 5, T], = y3, (5. 1 . 1 e) 


The variable tj is sometimes called the dependent variablet ; X^, A,.,, and 
Xi 2 are independent variables that might represent time, position, tempera- 

Conditions refer, for example, to the X, values in (5.1.1c). 

In the statistical literature y is called the dependent variable. 



>32 


CHAPTER 5 INTRODUCTION TO LINEAR ESTIVUTION 


lure, velocity, cost, and so on Clearly some of these models are related 
For example. Model 2 reduces to Model I if A", = I Also, Model 5 includes 
both Models 3 and 4 

In each case there is a restnclion related to the measurements Assume 
that there are n observations For Model I the restriction is simply that 
there is at least one observation or /t > I For Models 2, 3, 4, and 5 the 
respective restrictions are as follows 


2 

/-I 

(at least one X,=^0 needed) 

(5 

12a) 

2 (.v,-r )Vo 

(at least 2 different X, values needed) 

(5 1 

1 2b) 


(at least 2 different X, values needed) 

(5 

I 2c) 


(at least 2 different sets of A’.j.A'yj ) (5 1 2d) 

where V = ^1.jX,/n 

In each of the models, except the first, the independent variables X, or 
X^ could represent a number of equally or unequally spaced values 
Alternately, X, might represent values of vanous functions of lime, t. such 
as 

3/, ■*-/,, sinar, cosar, e'*', Ini, 

or some combination of them The quantity a is here assumed to be 
known 

In most of this chapter the errors are considered to be additive Then for 
Model 3 

r,=j8o+^,X. + t. (5 13) 

where e is the unknown error and Y , » the measurement at X, The model 
given by (5 I 3) can, however, represent the following two cases 
Error Model A Errors m Measurements 


n.=A+^i^. 

T( = »i,+^ 


(5 14) 



5.1 MOTIVATION, MODELS, AND ASSUMPTIONS 


133 


Error Model B. Errors (Noise) in Process 

7, = ,,, (5.1.5) 

where tj,- represents the quantity being measured and 7,- is its measurement. 
Implicit in these models is the assumption that there is no error in A',.; that 
is, Xj is not a random variable as are 7,. and e,.. In Error Model B, tj,- is also 
a random variable. 

In Error Model A there are errors in the measurements but there is none 
in TJ. In order to quantify e, one can study the error characteristics of the 
measuring devices be they thermocouples, hot-wire anemometers, microm- 
eters, etc. These errors can be reduced by more precise devices. As 
technology improves, one would expect £,• in Error Model A to decrease. 
The system model itself is assumed to be errorless or noiseless. This implies 
that the physics is well-understood and that there is no stochastic noise 
entering in tj. This would be the case for many physical measurements. 
Consider, for example, the steady state temperature distribution in a flat 
plate which is linear with position. The randomness in observed tempera- 
tures for repeated measurements would be the result of measurement noise 
rather than some physical phenomenon causing the fluctuation. 

In Error Model B the measurements are assumed errorless; but the 
model (tj) contains “noise”; that is, the variable being measured deviates 
by some stochastic component from its expected value. An example is 
turbulent flow between two parallel plates. Part of the universal velocity 
profile for turbulent flow is described by the expression 

where the dependent variable is a dimensionless velocity andy"*", the 
independent variable, is a dimensionless distance. In this case instanta- 
neous velocity measurements fluctuate about the mean value u'*' owing 
more to the turbulence phenomenon than to measurement inaccuracies. 
Hence this is an Error Model B case. For Error Model B q. would not be 
expected to decrease with time (that is, with improved measurement 
capability). Also a study of the sensor would not yield any information 
regarding q. 

Regardless of whether Error Model A or B is correct, the estimation 
problem is formally the same for the physical models considered in this 
chapter. The meaning of tj and e is different, however, as are the statistics 



134 CHAPTER 5 INTRODUCTION TO LINEAR ESTIMATION 

for £ We shall visualize Erior Model A as the model considered in ihis 
chapter 

5.U Statistical Assumptions Regarding the Measurement Errors 

Assumptions regarding the measurement errors should be carefully stated 
m each estimation problem If the assumptions do not accurately describe 
the data, then one can at least pinpoint the assumption(s) which are not 
satisfied The mere identification of the incorrect assumptions may lead to 
more realistic assumptions and thus better estimators 

Different assumptions lead to different estimation methods In this 
chapter we consider three commonly used methods ordinary least squares 
(OLS), maximum likelihood (ML), and maximum a posteriori (MAP) The 
following conditions given in terms of Enor Model A and Model 3 are 


termed the standard statisiical assumptions for »*= 1, 2 ,n 

1, TjofCTj/Jg i8|) + £,«ij, + e, (additive errors) (5 1 6) 

2. ^(e,)"© (zero mean errors) (5 17) 

3. K(y)| (constant variance errors, homoskcdasticity) (5 18) 
INote£(e,^)-e* if £(f,)«0) 

4, £{(e,-£(e|)I[ej - £(^)l}“0for If*/ (uncorrelated errors) (5 19) 
(or £(e,e^)“0 if £{f,) = 0 and if*/ ) 

5 e, has a normal probability distribution (5 1 10) 

6. Known statistical parameters (Sill) 

7. k'(A,)**0(non8lochasticmdependentvaTiable) (5 112) 


8 No prior information regarding and 0, and parameters nonrandom 

(5 1 13) 

In order to describe the assumptions concisely and explicitly we assign a 1 
or 0 to the above assumptions where I means yes and 0 no For a case 
when all the assumption are satisfied we designate them as 1 1 1 1 1 1 1 1 where 
the first 1 on the left refers to the additive error assumption the second 1 
refers to the zero mean assumption, etc In some cases additional numbers 
are used to indicate more information than a simple no For example for 
the uncorrelated error condition 2 designates first order autoregressive 
errors See Section 6 1 5 for a more complete list of possibilities other than 
1 or 0 If an assumption is not used then a dash will be used in lieu of a 1 
orO 

Assumptions 2, 3, 4, and 7 are sometimes referred to as the Gauss- 
Markov assumptions 



5.2 ORDINARY LEAST SQUARES ESTIMATORS (OLS) 


135 


5.2 ORDINARY LEAST SQUARES ESTIMATORS (OLS) 


In ordinary least squares estimation the sum of squares function to be 
minimized with respect to the parameters is simply 

1=1 


where tj, is a function of the parameters such as ySg and yS,. 

It is important, to observe that no statistical assumptions are used in 

obtaining OLS parameter estimates, that is, the assumptions are . In 

order to make statistical statements regarding the estimators it is necessary 
to possess information regarding the measurement errors, however. 

In derivations to be given we may need the variance of 2 F, where rf, is 
not a random variable. Assume that the errors in F, are additive, have zero 
mean, and are uncorrelated (assumptions 1, 2, and 4, respectively). Then 


V 


^dX 


1 = 1 



2 ^U + e,) 


;=1 


= E 


2 d,{-n,+'e,)- 2 

(=1 /=! 


= V 



2 d,^V{e,) 


(=1 


2 dXf 


/ = ! 


(5.2.2) 


where the first assumption is used on the first line of (5.2.2), second 
assumption on the second line, and fourth on the third line. (5.2.2) is a 
special case of (2.6.20). 


5.2,1 Models 1 and 2 (ij, = ySg and tj, = /3X,) 

Both Models 1 and 2 are covered in this section. Since Model 2 is the more 
general, we start with it and then apply the results to Model 1. For Model 
2 (')), = ^,F,), (5.2.1) can be written 


1 = 1 


(5.2.3) 



136 


CHAPTER 5 ISraODUCTION TO LINEAR ESTIMATION 


Differentiating S with respect to /3„ replacing by the estimator i,, and 
setting equal to zero give the normal equation, 

2y,X^“fr,S;if*>=0 (524) 

whose solution for Model 2 is 


(5 2 5) 


By setting A’, = 1 in (5 2 5) the Model I estimator is 

(5 26) 

which IS the average Y, For these two estimators, no statistical assump 
tions are used but at least one observation must be made, and in the case 
of Model 2, at least one Y, must not be zero 
The predicted, regression, or smoothed value is denoted Y, and is called 
“ Y, hat ” For Models 1 and 2, respectively, Y, is 

r,-f>o-V. (527a, b) 

The residual e, is the measured value of Y. minus the predicted value or 
e,sY,-Y, (52 8) 

The residual e, is not equal to the error *, but it can be used to estimate e, 

5 2 1,1 Mean and Variances of Estimates 
Using the standard statistical assumptions of additive, zero mean errors 
and nonstochastic X,, Pq and (11 — 11), we get for the expected value of 

the Model 2 parameter 

Qua can, also show for Model 1 that E(,b^ = Pr^ Hence the least squares 
estimators and b^ are unbiased for the stated assumptions (see Section 
32 1) 







5.2 ORDINARY LEAST SQUARES ESTIMATORS (OLS) 


137 


Suppose that all the standard assumptions are valid except that e, need 
not possess a normal density and may or may not be known (assump- 
tions 1111-11); then the variance of bg using (5.2.6) and (5.2.2) is 

K(6o)=S«-V=^ (5.2.9) 

From (5.2.5) and (5.2.2) the variance of 6, is 

r r n 1“' 

v{b,)= ty (5-2-io) 

,=i L ' 

Notice that (5.2.9) and (5.2.10) both indicate that estimates as accurate as 
desired can be obtained by simply taking a sufficiently large number of 
observations. This naturally requires that the underlying assumptions be 
valid. If the measurements were correlated, for example, this conclusion 
might not be true. 

Also note that for Model 2 (Ti, = iS,A',) there is optimum placement of 
observations. Suppose that n observations are to be obtained and it is 
desired to obtain a minimum variance estimate by selecting the A", so that 
[.^,1 < |A’,„|. Then the variance of 6| is minimized if all the measurements 
are concentrated at giving V{b^ = a^/nX^. This would be the best 
choice of the X, values provided there is no uncertainty in the model (i.e., 
functional form of tj,). 

Suppose that all the standard assumptions are valid except there may or 
may not be normality and is unknown (1111-01 1). Then the variances of 
6o and 6, are estimated by replacing by an estimate which is designated 
s^. The square roots of F(6o) and F(6,) with this replacement are called 
the estimated standard errors (or standard deviations). 


est. s.e.(6o) = s/7 

(5.2.11) 

r « 

est. s.e.(6,) = j 

/= i 

(5.2.12) 


5.2.1.2 Expected Value of 

An estimator for is not directly obtained using OLS as it is using ML 
estimation. One can, however, for the assumptions 1111-011 relate the 
expected value of the minimum sum of squares, designated to a^. 



138 


CHAPTER 5 INTRODUCTION TO LINEAR ESTIMATION 


Since £(1;- r,) = 0, 

(5 2 13a) 

and thus the expected value of is 

>))"]-2£[(l',- r,)’]-SK(«,) (5 2 13b) 

(5 2 13b) IS valid for any number of parameters It still remains to find 
V{e,) m terms of o* It is always true that 

v{e^)= y(y,- y,)= F(y,)+ F(r,)-2 cov(r,,y,) (5 2 m) 

The V{Y,) term is simply The other two terms are considered below 
For the one-parameter models we can write T, ** f> | X, X, S ^ so that 

F{y,)-X.*24V 

using (5 2 2) For constant error variance o* the variance of Y, for Model 2 
ts 


F(y,)-x;V(2X/] ' 

(5 216) 

5 have for Model I (ij, ■ ^q), 



(5 2 17) 


Observe that the variance of the predicted value of Y, is a constant for 
Model 1 but increases with Xf for Model 2 
The third term on the right side of (5 2 14) for assumptions 1111- 11 and 
Model 2 IS 


-2cov(r.,y,)=-2A;4a*=-2X.^[2X/] V (52 18) 
Combining the above results yields for Models 1 and 2, respectively, 

which are both less than K(e,)“<i* In both cases the expected value of 



139 


5.2 ORDINARY LEAST SQUARES ESTIMATORS (OLS) 
is found using (5.2.13) and (5.2.19) to be 

= (5-2-20) 

and thus an unbiased estimator for designated or 6^, is 

= = „>1 (5.2.21) 

(«-l) (n-1) 

This expression is valid for one parameter with assumptions 1111-011 and 
can be used in (5.2.1 1) or (5.2.12). For one parameter, can be estimated 
by only using two or more observations. 

Example 5.2.1 

An automobile is traveling at a constant speed and the distances traveled at the 
end of 1, 2, and 3 min are measured to be 1.01, 2.03, and 3.00 km. Assume that 
distance is the dependent variable and time the independent variable. The regres- 
sion function for this case is that the distance traveled, h, is equal to the velocity, t>, 
times the duration traveled, r; in symbols, h — vt. Use OLS to estimate v. 

Solution 

This is a Model 2 case with v being the parameter. Using (5.2.5) with Y, being the h, 
measurement, we find 

r 1-1 

2 Td, 2 tj 

;=1 JL 2=1 

= [ 1 .0 1 ( 1 ) + 2.03(2) + 3(3)][ 1 + 4 + 9] ~ ' = 1 .005 km/min 
where Y, is the observation of /i,. 

Example 5.2,2 

An object is dropped in a vacuum and the position h is observed at various times t,. 
The observations of h„ designated Y„ are given as 

',(sec) 0.1 0.2 0.3 0.4 

y,(m) 0.05 0.2 0.4 0.8 

The measurements are to be used to estimate the local gravitational constant g. 

The position /i is described by the differential equation h = g and the initial 
conditions /i = /i = 0 at r = 0; the solution for h is h = gt^/2. 

(a) Using ordinary least squares, find an estimate of g. 




140 


aupTER 5 urniODiicnoN to linear estimation 


(b) Using ch« standard assumptions «Kept that o* is unknown and t, need not 
be normal, give an estimate of the standard error of g 


(a) The given model is the same as Model 2 with g being fi and A*, being t^/2 The 
estimator for OLS is (S^5) which can be vmiien as 


r 2 vd} 

y.'t 


7-1 

2 

4 


Then the numerator and denominator are. respectively, 

>'i'>|{0S(O*+2(2)*+ + 8(4)*J =.008625 

5 2 ?,’'-5[(»*+(2>*+(3/ + (4/]-OOOS85 

and thus the estimate is ^-OOSblS/DOOSSS-^TASg m/sec* 

(b) The residuals, e,* V,- T,, ate, respectively 00012T, 000508, -003855 and 
0 02034 and the sum of s<^uares of these terms as 0 001928 From (5 2 21) the 
estimated standard deviation is 


(«-I) 


[ 001928 ] 

[(4-l)J 


-0 02535 
of i is 


and then from (5 2 12) the estimated standard e 

4 -1/1 

: 2/; 

est ae(g)- 

=02695 m/sec* 

which can be compared with the estimate of 9 7458 m/sec* 


S2.2 Two-Parameter Models 
S.ZZJ Model5,yi,^PiXii+^a 

In order to simplify the presentatioB of the two-parameter cases, the 
general two-parameter case. Model 5, is considered first Using the sum of 



5.2 ORDINARY LEAST SQUARES ESTIMATORS (OLS) 

I 

squares function, (5.2.1), with Model 5, (5.1.1e), we have 

/=! 

We differentiate S with respect to /3„ setting the derivative equal to zero, 
and replace by its estimator h, and p 2 by 62- Repeating the same 
procedure for P 2 yields the two normal equations 

hiC,j + 620,2 = d, (5.2.23a) 

6 , c , 2 + 62C22 = ^2 (5 .2.23b) 

where 

i i Y,X,, (5.2.23c) 

(=1 1=1 

Notice that the coefficient c,2 appears in a symmetric manner in (5.2.23a, 
b). Solving (5.2.23a, b) for 6, and 62 yields (for Model 5) 


(5.2.24a) 

(5.2.24b) 


, ‘^ l ‘^22 *^2^12 

, ^2^11 ^\^\2 

b ^■- ^ 

*2- A 

A = CiiC 22— 0^2 



141 

(5.2.22) 


No statistical assumptions were necessary to derive the estimators given in 
(5.2.24a). Using the three standard assumptions of additive, zero mean 
errors and nonstochastic X, it can be shown that 6, and 62 ^^e unbiased 
estimates of and 

The variance of 6, can be readily found by writing 6, as 


b,=^{f-g,)Y- X = 


^22-^1 1 


8, = - 


^12-^i2 

A 


(5.2.25) 


Then using the standard statistical assumptions 1111—11 and (5.2.2) the 

f^(^) = 2(i;-g,)V = 2U^-2Xg,+g2)a^ 


variance of 6, is 


or simplifying gives 


[C22C1, 2c22C^2'b e,2C22]n^/A^ 


V{b,) 


C220r^ 


A 


(5.2.26a) 




142 


CHAPTER 5 INTRODUCTION TO LINEAR ESTIMATION 


In a similar manner it can be shown that V(fej) and cov{b,,b 2 ) are given by 

(5226b, c) 

The predicted value of y, is Yf, 

y,=6,A„ + fcpr,j (5 2 27) 


The variance of Y, is then 

I'(y,) = Arjt'(6,)+X,|r(6j)+2X;,X„cov{6,.bj) 

= [X^,c^+X\c,,-2X,,XaC»yf£^ (5 2 28) 

where (5 2 26) is used It can also be shown that eov( T,, T,) is equal to the 
same value or 


cov(y..y.)-K{y.) (5229) 

From (5 2 14), (5 2 28). and (5 2 29} the vanance of the residual e, (■ F,- 
yj IS equal to 

y{e,)’>l^-X;‘,Ci2~X,lc„ + 2X,,X2Cy2y/A (52 30) 

Then using the result that is equal to 2 F(e) given by (5 2 13b), 

we find that 


£(S„,i,)«[/iA-Ci|C22“«‘2r«'ii+2ef2]oV‘^ 

(5 2JI) 

Since A = C] 3C2J — c?2 Consequently, for the two-parameter case with Model 
5 and assumptions till Oil, an unbiased estimator for is 

s^^S^J(«-2) (n>2) (5 2 32) 

which differs from (5 2 21) in that there is a factor of n~2 rather than 
«-l Observe that (5 232) is properly meaningless for n = 2 For two 
parameters and two observations the two residuals must be zero also giving 
5'„in‘=0 Consequently, for two parameters, can be estimated only if 
n>2 



5.2 ORDINARY LEAST SQUARES ESTIMATORS (OLS) 


143 


5.2.2.2 Model 3, Tj, — + 

Model 3 results can be found from those of Model 5 by replacing in Model 
5 )8, by i8o, h, by b^, by y8„ by by 1, and by X,. This gives 

c,. = «, c ,2 = 2X, (5.2.33) 

A = n(2X2)-(2X,)^ rf, = 2y„ ^2 = 27, X, (5.2.34) 

One must be careful where the squares are placed in A; note that 2X,^ 
means the sum of whereas (2X,)^ means the square of the sum of the X, 
values. It also can be shown that A is also equal to 

A = n^{X-xf, X^i-f (5 .2.35a, b) 

1=1 y=i 


From the above relations bj, the estimator of 13, in Model 3, wbicb is 
t}, = I3 q+ /?]X,, can be found from bj in (5.2.24a) to be 


«(2y,x,)-(sy,)(2X,) 


(5.2.36) 


Using (5.2.35a) this expression can also be written (Model 3) 


2(x,-x)y, 2(x,-x) 


2(X,-X)' ^X,- 



(5.2.37) 


where y=2 y,/n and the range of each summation is from /= 1 to n. The 
estimator for bg can also be found from (5.2.24a) by using the expression 
for b,. Instead we shall use (5.2.23a) divided by n (and b,— >bo and b 2 ^b,) 
to get 


bo=Y-b^X 


(5.2.38) 


Hence if X = 2X,//7 is equal to zero, bg is simply Y. For this reason and 
the resulting simplifications in (5.2.37), a transformation sometimes used in 
hand calculations redefines X, so that X = 0. 

As mentioned several times above, no statistical assumptions are used to 
obtain the estimators for bg and b, given respectively by (5.2.38) and 
,(5.2.37). Suppose now that the standard assumptions are valid. A number 





144 


CHAPTER 5 INTRODUCTION TO LINEAR ESTIAUTION 



Figure 5 1 Linear model with Y being a random variable with constant 0^ and normal 
probability distribution 

ol these are illusuated by Fig 5 1 for Model 3, t],<= Po’¥ ^,X, The nonnal 
probability density is superimposed upon (he curve for several A, values 
The first two assumptions of additive, zero means errors are implied in Fig 
S 1 The third assumption of constant vanance is depicted explicitly as is 
the normality assumption (number 5) The nonstochastic X, assumption 
(number 7) is implied by the lack of a probability density in the X, 
direction 

Mean and Variances for Model 3 

The OLS estimates of Pq and 0, are unbiased for additive, zero mean 
errors as was demonstrated for (he more general case. Model 5 
From (5 2 26) (5 2J3), and (5 244) the vanatices and covariance of 
and b, are 

V {bo) = V (ft,) = ^ (5 2 39a.b) 

cov{bobt)= -nXa^/A (5240) 

where A is given by (5 2 35a) Assumpbons are used 

From (5 2 28) and (5 2 30) the vanaoces of the predicted value K, and 
the residual e, can be wntten 


(5 2 41l>) 



5.2 ORDINARY LEAST SQUARES ESTIMATORS (OLS) 


145 


Unlike the variances of and the variances of F, and e,. are functions 
of i. Note that K(y,) has a minimum at X,- = F and maximum value at the 
smallest or largest value of A',.. variance of the residual e- is different 
in that it has a maximum at A",. = X. 

The estimated standard errors of and b^ are found from (5.2.39a, b) to 
be for assumptions 1111-011, 

■ 

est. s.e.(Z)o) = j (5.2.42a) 

est. .s.e.(6,) = 5'[n/A]'^^ A = n'Z(^X^ — X ) (5.2.42b) 

where from (5.2.32), 5' = [5'„,„/(« — 2)]'^^ 

For Model 3 the sum of the residuals is equal to zero or 

ic=0 (5.2.43) 

1 = 1 

This interesting result can be used to check the accuracy of calculations for 
the parameters. This result is true for any linear or nonlinear model 
provided there is a term in the model, that is, a parameter not multiplied 
by a function of an independent variable, and provided OLS is used. 

Example 5.2,3 

Experiments have been performed for the heat transfer to air flowing in a pipe. A 
dimensionless group related to the heat flow rate is the Nusselt number, designated 
Nu. This is a function of the Reynolds number, denoted Re, which is proportional 
to the average velocity in the tube. Below are some values for the turbulent fluid 
flow range. 

Re 10“ 2x10“ 4X10“ 5x10“ 

Nu 32 60 90 119 

The suggested model is Nu = noRe®' where the parameters are Oq and a,. Reduce to 
a linear form and estimate Oq and a, using ordinary least squares with log Nu being 
the dependent variable. 

Solution 

Take the logarithm to the base 10 to get 

logNu = logao+ailogRe 



146 


CHAPTER 5 LVTRODUCriON TO LINEAR ESTIMATION 


For convenience write the model in the Model 3 form, i|. 'vith 

]ogNu-*ij,, logflo-»ft^ logRe-^A", 

The tabulated values of Nu art used to obtain log Nu which is now Y, as given 
below 

A',(=logRe) 40 43010 46021 46990 

Y, ISOS! 1T782 1 9542 2 0755 

The estimates of bg and 6 ) ate found using (5 2 37) and (S 2 38) In these equations 
the following are needed, 

- SA',_ [40 +4301+46021+4699] 

Xr.~ 

=4 400525 

2(;r,-/)*-(4 - 4 400525)*+ +(4 699 - 4 400525)*--0 3000453 

2(3r.-;?)F,-(4 - 4 400525)(l 5051)+(4 30I-4 400525K1 7782) 

+ -0 2335972 


Then (5 2 37) gives 


23351.72 

■ r{x,-x)’ ■ 


and from (5 2 38) bg is 


The estimate of ag is 


*9= y-6jAr=- 1 597734 


4„=l0^=0252503 

Thus the prediction equation for Nu is 

Nu= 0253Rc’” 

where some of the decimal places have been dropped 



5.2 ORDINARY LEAST SQUARES ESTIMATORS (OLS) 


147 


Example 5,2.4 

Normal random error terms [2] with a mean of zero and unit variance have been 
added to the model 7}, = + with Pq set equal to 1 and p^ set equal to 0.1. The 

“data” are tabulated in Table 5.1. 

(a) Estimate the parameters pQ and P^ using ordinary least squares. 

(b) Find the estimated standard errors for bg, b^, and T, using the standard 
assumptions except that the errors need not be normal and that is unknown 
( 1111 - 011 ). 


Table 5.1 Data for Example 5.2.4 


Observation 


V, 


T, 

1 

0 

1.0 

-0.742 

0.258 

2 

10 

2.0 

-0.034 

1.966 

3 

20 

3.0 

1.453 

4.453 

4 

30 

4.0 

0.963 

4.963 

5 

40 

5.0 

0.040 

5.040 

6 

50 

6.0 

0.418 

6.418 

7 

60 

7.0 

1.792 

8.792 

8 

70 

8.0 

-0.374 

7.626 

9 

80 

9.0 

-0.222 

8.778 


360 =2 A', 


3.294 =2e, 

48.294 =27, 


Solution 

(a) The OLS estimators for b^ and bi are given by (5.2.37) and (5.2.38). In these 
equations X and 7 are needed, 

^=^2 2r,= l(0+10 + 20+-- - +80)=-?|^=40 

2 7, = ^ (.258 -t-1. 966+ ••• 8.778) =^^=5.366 

" (=1 ^ y 

Additional required calculations are given in the second, third, and fourth columns 
of Table 5.2. Then the estimates of Pi and Pq are 


S(2r,-x)7, 


611.93 

6000 


= 0.10198833 


6o= 7- 6, A' = 5.366 -(0.10198833)(40)= 1.2864667 
which happen to be about 2% and 29% larger than the true values. 



148 


CllAPTER 5 INTBODUCTIOV TO LINEAR ESTIVUTION 


Table 5.2 Calculations for Example 5.2.4 



x^-x 


r,(Xi~x) 



0 

-40 

1600 

-10 32 

1 28647 

- 1 02847 

10 

-30 

900 

-58 98 

2 30635 

-0 34035 

20 

-20 

400 

-8906 

3 32623 

1 12677 

30 

-10 

100 

-49 63 

4 34612 

061688 

40 

0 

0 

0 

5 36600 

-0 32600 

50 

10 

lOO 

64 18 

6 38588 

003212 

60 

20 

400 

175 84 

740577 

U8623 

70 

30 

900 

228 78 

8 42565 

-0 79965 

80 

40 

1600 

351 12 

9 44553 

-0 66753 

360 - 


6000 = 

611 93 = 


0 00000 = 

sx, 


I(X,-X^ 

X(X,-X)Y, 


2(T,-7;) 


All eight significant figures given in these esiiraaie* are not needed, but it is 
usually wise to carry a couple of extra significant digits in the calculations because 
\VieTe can be mall dillerences of large numbers 
The predicted value of the dependent vanable, Y,. can be found from 

f+b,{A',-jr)-5 36d+O!0i98833(A'.-4O) 


and 1$ also given in Table S 2 The residuals e, are also given Note that the sum is 
zero 

(b) In order to find the estimated standard errors it is necessary to evalute 
which in turn needs which is 5 937718 Then from (52 32) 


«-2 


5932218 V'* 

9-2 ] ' 


>0 921002 


which IS an estimate of the standard deviation Compared with the true value of 
unity this is only about 8% loo low 
From (5 2 42a) the standard error of b# is 


est 





nS(X*-X) 


2Q40Q 

9(6000) 


r'* 

(0921002)=0 56608 


and the standard error of bj is obtained from (5 2 42b) 






^^=0011890 

(6000) 


Notice that bo±est se(6o) b 1 286 *0566 which includes the true value of I 
6,iest &e(6,) is OIOl99iOOII9 which also includes ihe true value of 0 1 



5.2 ORDINARY LEAST SQUARES ESTIMATORS (OLS) 


149 


Statistical statements regarding the accuracy of the estimates are discussed in 
Chapter 6 in connection with the confidence region. 

The estimated standard error of the predicted (or smoothed) value of Y, usmg 
(5.2.41a) is 


est. s.e.(Y,) = 



n 


^Xy,-Xf 



I 

9 


(Y,-40)"l 

6000 


1/2 

(0.921002) 


which varies from a minimum at Y, =40 of 0.307 to maximums of 0.566 at X, = 0 
and 80. This latter value is the same as for est. s.e.(Z)Q) because bg in this case is also 
the ^, = 0 value of Y,. 


5.2.2.3 Estimators for Model 4, t), = j8o+ jifX, — X) 

Model 4 is interesting because a number of the results have simple forms. 
Without any statistical assumptions the OLS estimator for is 

bi-r-^ ( 5 . 2 . 44 ) 

and the OLS estimator for y8i is the same as that given for Model 3. 

Using the assumptions 1111-011 the variance of bg is 

Vib'o)-^ (5.2.45) 

and that of h, is given by (5.2.39b). The covariance of bg and 6, is simply 

cov((7o,h,) = 0=cov(^y,/7,^ (5.2.46) 

The variance of Y, and e, are equal to those given for Model 3, 

(5 .2.41 a, b). 


5.2.2.4 Optimal Experiments for Models 3 and 4 

If one has the freedom of taking the observations at any X, values for 
estimating parameters in Models 3 and 4, then one should select the X, 
values so that the most accurate estimates of parameter values are pro- 
duced. Such designs of experiments are termed optimal and yield optimal 
parameter estimates. Our criterion of optimality in this section is that of 
minimum variance of 6,. A more general criterion and analysis is given in 
Chapter 8. 

Models 3 and 4 provide exactly the same OLS 7, values. For that reason 
we consider the variances for Model 4 for assumptions 1111-011. The 



150 


CHAPTER 5 INTllODtJCnON TO LINEAR ESTIMATION 


variance of is independent of X, and the covariance of b’^ and i, is zero 
Hence only the variance of A, which is given by (5 2 39b) need be 
considered Note that is minimized by maximizing Let 

the maximum permissible range of X, be between X„,„ and Then it 
can be rigorously shov/n that w tmmmized if one halt the measure 
ment are made at and the other half at A'„„ No intermediate 
measurements are taken The optimal case is illustrated by Fig 5 2 
The vanances of fi, with uniform spacing of the X/ values given by 

Jr, = (i-l+e)5. /=l.2. ,n (5247) 

for various models are given m the fifth column of Table S 3 which is a 
summary of the results of this section The spacing between the values is 
5 and the first X, value is = where c is a factor locating A”, The 
largest X, value is X^ => (n- I + c)5 For this uniform spacing the variance 
of h, IS 

S^2L_ (5248) 

If one half the observations were located ai c5 and the other half at 
1 +c)5. the variance of b, is (for this nonuniform spacing) 

n(M (52«) 



Figure 5J Recommended tocalwn ot meunKmend wten model is known to 
line m X 


be u slraight 



5.2 ORDINARY LEAST SQUARES ESTIMATORS (OLS) 


The ratio of K„(h,)/ is 


Vujby) Jjn-l) 
V„(b,) n + \ 


(5.2.50) 


which is equal to 1 for « = 2 and monotonically increases to 3 as n^oo. 
Hence for large n, there is a factor 3 in the ratio of variances of 6, for the 
uniform spaced case and the case of placement of the observations at the 
extremes. 

In using the next to last column of Table 5.3 one should note that 




and thus 


S = 


y _ y 

^ max mm 


c = 


^min ^min(” 0 


y _ y 


(5.2.51) 

(5.2.52) 


In this discussion of optimal design of experiments it is important to 
note that the standard assumptions of 1111-011 are assumed. Also there 
should be no uncertainity regarding the validity of the model. If the model 
is in question then one would be better advised to choose equal spacing of 
the X, values or equal spacing in “time” if X, is a function of time such as 
t^/1. 


5.23 Comments Regarding Definitions 

In this section a number of definitions are given. Some of these can be 
confusing. There are, for example, several expressions related to T,. We 
have 


y, = 7j, £,, measured value of 7, 

E(Y,) = 7],, expected value of 7, or model or dependent variable 

7, = bo+bjX,, predicted value of 7, for Model 3 
- 2 7, 

7= — ^ , average value of 7, for / = 1 to i = « 

Also used is the symbol e, for measurement error or noise. This should not 
be confused with the residual e, which is 7,— 7,. The independent variable 
X^ is assumed to be errorless and has an average value given by A' = 
SA",/ n._All these terms are illustrated in Fig. 5.3. Modified definitions for 
X and 7 may be used in subsequent sections when is not a constant. 





+H^-x) 


O 



X= 12;^,= 1(«- l + 2c)5, 1.{X-X )'= ^n{n^- 1)5^ 
2A',2=^((« + c)(« + c-0(2n+2c-l)-c(c-l)(2c-l)) 





154 


CIUPTER 5 INTOODUCTION TO LINEAR ESTIMATION 



5J MAXIMUM LIKELIHOOD (ML) ESTIMATION 

M&ximuni likelihood estimates make use of whatever information we have 
about the distribution of the observations We illustraie ML estimation for 
the case of additive errors, and when the errors t, have 

itro mean, are independent, are normal, and have known variances oj 
The X,’% are errorless and the parameters are nonrandom These assump* 
tions are designated li'lllll This information can be used to obtain 
estimates of parameter variances 

The natural logarithm of the normal probability density for independent 
measurements is given by 


in/(r„r„ .y.lfeft, )--4 r.ui2,+ x (53 1) 

^ I J 


where the “physical” parameters are only contained in 



(5 3 2) 


The one- and two-parameter cases are considered briefly in this section 
It IS pointed out that the ML estimators for Models 2 and 5 can be given in 
a similar form to those given by OLS 



53 MAXIMUM LIKELIHOOD (ML) ESTIMATION 


155 


5.3.1 One-Parameter Cases 


Consider the linear model of 'n, = l3^X, (Model 2) and introduce this -q, 
expression in (5.3.2). The function ln/(y,„..., 7„| /3,) is maximized with 
respect to by minimizing since /8, appears only in Differentiat- 
ing with respect to y8,, replacing |3^ by its estimator 6,, and setting the 
derivative equal to zero yields the normal equation 









(5.3.3) 


which can be solved for b^ to obtain (for Model 2) 



(5.3.4) 


Note that this expression reduces to exactly the same one as given by 
(5.2.5) for OLS if af = a^, a constant. Also note that by defining 



(5.3.4) can be written as 

Zi, = (2F,Z,)(2Z/) 


(5.3.5) 


(5.3.6) 


which is also similar to the OLS expression, (5.2.5); here F, is analogous to 
y, and Z, to Z,. In terms of F, and Z,, is a sum of squares of terms 
which have constant variance and has the same form as for OLS. Finally 
note that the variance of F, is unity. 

From the analogies given above between 7, and F„ X, and Z,, and 
and unity, the variance of 6, can be found from (5.2.1Q) to be 


F(M = (2Z2)-' 



(5.3.7) 


For Model 1, q, = Pq, the estimator and the variance of b^ are found 
by letting Z, = 1 in the above two equations. 


6o=y; Y~(2Y,ar^)C2o-Y' 

L(6o) = [2a,-^]-' 


(5.3.8a,b) 

(5.3.8c) 



156 


CHAPTER 5 INTRODUCTION TO LINEAR ESTIMATION 


5J 2 Two-Parameler Cases 

For the general model, Model 5, given by V,, + /32-'’fZ' the estimators 
for and ^2 and their variances can be obtained by letting 


y,-+F„ ir*-^l 

(539) 

and thus (5224) and (5226) could be us^ for the estimators b; and 
their variances, and covanance 

For Model 3, = assumpttons ll-lllll (5 3 9) can be 

used to find 

2 r,t..-’(x,-x) 

6,- bo~Y^b,X 

(5 3 10a,b) 

_ IX^er^ _ 2y*a*“* 

X- — ~-.y- - ■ 

(5 3na,b) 



y _ 2 . • (®|)“ _ 2 

• (5 3 12a b) 


Note the new definition of X given by (53 I la) The same dcfimtioa of 
IS given m (5 3 8b) and (5 3 1 lb) For constant o^, these definitions for 
and Y reduce to those given in Section 5 2 

Example S3 1 

Simple harmonic motion can be desenbed by iff, sin/ where Pq is a shifl of 

the axis and /3) is the amplitude of the motion Measurements and their standard 
deviations vary as indicated in the following table 


» /,(•) •>, y 

\ 0 001 04926 


2 30 

3 90 

4 150 

5 180 


005 

01 


0 9985 

1 3547 
0 9519 
04996 


003 

001 





53 MAXIMUM LIKELIHOOD (ML) ESTIMATION 


157 


(a) Estimate the parameters using ML. Let the standard assumptions apply 
except that we do not assume that o,^ equals a constant, 

(b) Find the standard errors for bg and b,. 

Solution 

(o) For this exarrrpie, the model is Model 3 and the estimators are given by (5,3.10) 
and (5.3.11). Note that A’, = sinr,. Some of the required detailed calculations are 
given below. 


1 


o, ^ 

X,ar^ 

X-X 

rt 

1 

1 


Y,or\X-X) 

~ 

0 

10,000 

0 

-0.0239 

5.723 

4926.0 

-117.847 

2 

0.5 

400 

200 

0.4761 

90.660 

399.4 

190.145 

3 

1 

100 

100 

0.9761 

95.273 

135.47 

132.229 

4 

0.5 

400 

200 

0.4761 

90.660 

380.76 

181.271 

5 

0 

10,000 

0 

-0.0239 

5.723 

4996.0 

-119.522 



20,900 

loo 


288.039 

10837.63 

266.276 


In addition to the sums indicated in the above table, X and Y are found from 
(5.3.11) to be 


500 
“ 20900 


= 0.0239234, 


10837.63 

20900 


= 0.518547 


Then from (5.3.10) 


, _ 266.276 
‘ 288.039 


= 0.924449 


bg= y-(,,X = 0.518547 - 0.924449 (0.0239234) = 0.496431 
(b) The standard errors are found from the square roots of (5.3. 12a, b) 


s.e.(6o) = 


S^OVVSa/- 

^X,-xfair 


,1/2 



[• 300/20900 I 


288.039 


1/2 


= 0.0069911 


s.e.(Z>,)=j^S(3r;t-A')V^j ^=(288.039) ’''^ = 0.05892 

Least squares estimates of the parameters for this example are 6o = 0.5 10329 and 
fi, =0.872829. The bg value is outside the *o±s.e.(f>o) interval found using maxi- 
mum likelihood. 


53.3 Estimating cr^ Using Maximum Likelihood 

When the error variance is a constant, that is, af = a^, an estimator for 
can be obtained by differentiating (5.3.1) with respect to and setting the 



1S8 


aUPTER 5 INTRODUCTION TO UNEAR ESTIVUTION 


result equal to zero The result is 




(5 313) 


or 

a‘- tf (53 14) 

This IS unforlunately a biased estimator for For one parameter, the 
denominator should be n— I to provide an unbiased estimator For that 
and other reasons use (52 21) to estimate for one parameter and use 
(5 2 32) for two parameters when the assumptions 1 1 1 1>01 1 are valid 


5.3,4 MaKimum DkeUhood Estimation Using Information from Prior 
Experiments 

After one set of data has been used to estimate the parameters a second 
set of data may become available If the second set of observations is 
independent of the first and parameter estimates based on all the data ate 
needed, then the first set of data can provide prior information for analysis 
of the second set A method is given below whereby the number of 
calculations m simultaneously analyzing ail the data can be reduced by 
taking advantage of the results of the analysis of the first set of data 
For simplicity let us derive the method for one parameter The ML 
estimator for one set of data when the standard assumptions I M 1 11 1 are 
valid is given by (5 3 6) assume that there are observations and write 
(53 6)as 

-n£f.Z, (5 3 15) 

where is the variance of h | 



Consider now a combined analysis of n*n, + n 2 observations Then (5 3 6) 
becomes 


(5 3 17a) 



5.4 MAXIMUM A POSTERIORI (MAP) ESTIMATION 


159 


where (the variance of b -^ is given by 

V,=U-^+ i zA (5.3.17b) 

' \ ' J = ni+1 / 

We point out that (5.3.17) uses only the previously calculated ^ , and F^ ^ 
values; no other information regarding the first u, observations is needed 
to calculate improved values of b and F. The same procedure can be used 
for more than one parameter. 


5.4 MAXIMUM A POSTERIORI (MAP) ESTIMATION 

There are several ways to introduce prior information. One of these is 
given in Section 5.3.4 above for ML estimation. In this method, informa- 
tion from previous tests is included in such a way that exactly the same 
estimates are obtained as if all the data were analyzed together. This ML 
method also assumed that the parameters were nonrandom. 

Another way to include prior information utilizes the maximum a 
posteriori (MAP) method. The MAP estimators are based on Bayes’s 
theorem and are therefore called bayesian estimators. In the MAP method 
the parameters either are random or are conceived as being random. 
Hence there are two situations when MAP estimators might be used: (1) 
when the parameters are random and (2) when there is subjective informa- 
tion. What is meant by random parameters is discussed further below. 

In this section the standard assumptions of additive, zero mean, uncorre- 
lated, normal errors as well as known statistical parameters and non- 
stochastic independent variables are considered to be valid. Also, there is 
information about a prior distribution of values of the parameters {/3). We 
assume this prior distribution to be normal with known mean and vari- 
ance. We assume throughout our experiment that the /S’s are constant, that 
is, nonrandom. These assumptions are designated 11011110. (In Chapter 6 
where a more detailed set of standard assumptions are given, two particu- 
lar sets of MAP assumptions considered are designated 11—1112 and 
11-1113.) 


5.4.1 Random Parameter Case 

In the random parameter case the parameter for a particular experiment or 
set of experiments is considered to be constant (or nonrandom). This may 
be clarified by an example. A particular steel is occasionally produced by a 



160 


CHAPTER 5 INTRODUCTION TO LINEAR ESTIMATION 


plant The thermal conductivity is known to vary from batch to batch The 
long run room-temperature average thermal conductivity (the parameter, 
0, of interest) is 20 W/m-^C with the standard deviation among batch 
averages being 0 I \V/ni-®C The disinbution is normal Then this infor 
mation regarding the random nature of 0 from batch to batch is described 
by the probability density of 

/(^)-[(2„)'>l)]"exp|-i(-^)"j (54 1) 

The standard deviation of measurements Y, for a given batch is known to 
be 0 4 For a single normal measurement the probability of this measure 
ment given the true conductivity 0 of the batch is 

Let us use Bayes's theorem m the form 
JKP\ ) 

where()d|nts the posterior distnbuiionof^given Y It includes mforma 
tion both from a large number of batches /(0), and from a given batch, 
/(Y]0) If additional measurements Y, are made, they are also considered to 
be from this giten batch 

Since the parameter 0 appears only in the numerator of (5 4 2) and since 
It IS convenient to take the logarithm of (542) we find that f(.0\Y) is 
maximized by minimizing 


(542) 

(54J) 


with respect to 0 

Notice m this example that the conductivity of a batch chosen at 
random is a random parameter Once the batch is chosen however, all our 
specimens are from this batch and thus the expected value of each is the 
same 

If we examine the conducUvi^ as a function of temperature, instead of 
having a single parameter eorrespondmg to room temperature conductivity 
we have a regression function containing a number of parameters These 
parameters vary from batch to batch but our estimates are estimates of the 
specific values of this particular batch 



5.4 MAXIMUM A POSTERIORI (MAP) ESTIMATION 


161 


Let us now develop an estimator for the parameter yS, in Model 2, 
7 / = y3, being chosen at random from a given population. With the 

assumptions mentioned above and that yS, is independent of e,, we have 

r = yS,2r, + e„ (5.4.4a) 

s~N{0,a}), £(e,i8,) = 0 (5.4.4b) 

and thus the (prior) probability density of the random parameter j8, is 


f{li,) = {l-nVp) '/"exp 


1 {P\ f^/}) 

2 Vp 


and that of 7,,..., y„ given yS, is 


(5.4.5) 


/( 7„ . . ., y„| ) = {n(27ra,") '/"} exp 


1 

2 




1=1 


(5.4.6) 


Introducing (5.4.5) and (5.4.6) into (5.4.3) and then taking the logarithm of 
/(/3,17„...,7„) gives 


ln[/(y3,|7„...,7„)] = - 


1 

2 


(n+ l)ln27r+ In Vp + 'Llna} 


+ 




+ 2(7,-)8,X,)V" 


-ln/(7.,...,7J (5.4.7) 


Note that /(7,,..., 7„) is not a function of the parameter yS,. 

In (5.4.7) we are effectively considering the joint probability of each 
random choice of (both) and the subsequent collection of observations. 
We concentrate our attention on those possible choices which include the 
observations we actually obtained and hunt among them for that yS, for 
which the probability is greatest. This yS, we use as an estimate of the 
particular value for the batch chosen. Note that we are dealing with a 
random variable, y3„ a collection of possible values, and a constant yS,, the 
value actually chosen, that is, the parameter for the particular batch used 
in the experiment. 

Taking the derivative of (5.4.7) with respect to jS, yields the normal 
equation. 


ip, - H)i ‘ - 2( 7. - )2r,a,-" = 0 


(5.4.8) 



\62 


CHAPTER S INTRODUCnON TO LINEAR ESTIMATION 


which, after the addition and subtraction of within the summation 
can be written as 


(5 49) 

where 

Y. X, 

, Z = — (5410ab) 

a, ' ff, 

Solving (5 4 9) for d, then yields 


4 _ , Sr,Z, + f,Vi‘ 

S2/+>V' SzZ+y,-' 


(5 4 lla,b) 


The expected value of h, given by (54 1 1) is Hence the MAP 
estimator for 6, is biased since it is not /?,. the value for the particular 
batch 

The variance of h, is affected not only by the errors in the measure 
menls, Y,, but by the vanability of /?, from batch to batch For measure 
ments involving a particular batch we are interested in the vanability of f>, 
compared to the value of the batch (/3,) Hence we are interested m the 
variance of the difference, h, -• fi. Using (5 4 1 1 b) we can show that 


(5412) 

Then the vanance of the difference, b^ - is given by 

K(h,-^,) = (22/+S7')'' (5413) 

where F(/8|)— is used Notice that as more observations are taken, the 
relative effect of the pnor information re^rdmg the random parameter 
diminishes As the number of measurements becomes arbitrarily large. 
2^^— »co and thus y(.b ^~ This means that the variability of 
estimators obtained using (5411) approaches zero for a particular batch if 
a very large number of measurements are taken for this batch 

Equations for the two-parameter cases involving Model 5 are given in 
Problem 5 21 


54 2 Subjective Prior Information 

Some authors such as Box and Tiao I3J regard the pnor probability 
distribution as a mathematical expression of degree of belief with respect 



5.4 MAXIMUM A POSTERIORI (MAP) ESTIMATION 


163 


to a certain proposition. In this context the concept of developing probabil- 
ities utilizing repeated observations is regarded merely as a means of 
calibrating a subjective attitude. In this view to say that one thinks the 
probability is one half that candidate A will be elected president means 
that we have the same belief in the proposition “candidate A will be elected 
president” as we would in the proposition “a toss of fair coin will produce 
a head.” We need not imagine an infinite series of elections in half of 
which A is elected, and in half of which he is defeated. 

This view can also be applied to the estimation of a physical property. 
The following is an example given in reference 3. Two physicists, A and B, 
are concerned with obtaining more accurate estimates of some physical 
constant [i, known only approximately. Imagine physicist A is very familar 
with previous measurements of yS and thus can make a moderately good 
guess of the true yS value; let his prior opinion about y8 be approximately 
represented as a normal density centered at 900 and having a standard 
deviation of 20, 

r 1 /7 n-> 1 / y8 — 900 

exp (5.4.14a) 

This implies that A believes that the chance of yS being outside the interval 
of 860 to 940 is only about one in 20. By contrast, suppose that physicist B 
has little experience regarding values of P and that his rather vague prior 
beliefs can be represented by a normal density with mean of 800 and 
standard deviation of 200, 

r 1/5 \ I ( ^-800 

/a(y8)=[(2vr)'/^200] exp -j(^^200~) 

We can see that B is much less certain of the true y8 value because any 
value between 400 and 1200 is considered plausible. 

Suppose that one of the physicists performs an experiment and an 
observation of p is made. Further assume that this measurement contains 
an additive, zero mean, normal error with a standard deviation of 40. The 
probability density of Y is the same as given by (5.4.2) with the 0.4 
replaced by 40. 

To make the results more general let us use the notation /( | ju) for the 
prior subjective information for /!,; for a normal distribution we have 

2 

/(/^ilM)=[(277)'/^a^] 'exp - ^ j 



(5.4.15) 



164 


aUPTER 5 INTRODUCTION TO LINEAR ESTIMATION 


The conditional probability density of/(K,, , y|,(/3,) is given by (546) 

For this case the use of Bayes’s theorem leads to maximizing the natural 
loganthm of the product ,Y„\^). or 


'*■ ^2 gl 

(54 16) 

which IS quite similar to (5 4 7) The estimate for /3, is 

^{F.~iiZ.)Z, 2F,Z+iiaj'^ 

6, o P.+ ■ T; . (5 4 I7a.b) 


2Z/ + C. 


2z/+c;^ 


which is identical to (S 4 1 la b). with ii being ng and Vg being ol It is also 
very similar to (5 3 l?a,b) which give ML esiimations for a combined 
analysis of two sets of observations 

As for the random parameter case the expected value of and the 
variance bj-jS] ate 

l'(6,-/!,).[i:z/ + o/]"' (54 18ab) 


Note that though the estimators given by (54 11) and (5 4 17) are identical 
in form, the meanings attached to the quantities Hg Vg, and are 
different 

Lei us return to the example of the two physicists For one measurement 
y= 850 the estimator b and its vanance for physicist A arc (since A" = I for 
V^0) 




(To ^ + 


(850)(40)~^+900(20 ) ^ 
40 ^+20 2 


= 890 


^'(b^) = (o ’ = 320 

Repeating the same calculation for physicist B gives *^ = 848 and V(bg) = 
1538 Note that though the observation was the same for both physicists 
the different normal prior distnbutioos resulted in physicist A having the 
posterior distribution of n(890, 17 9*) and physiast B having rt(848, 39 2^) 
Hence physicists A and B have different estimates and different standard 
deviations of 17 9 and 39J. respectively 



5.4 MAXIMUM A POSTERIORI (MAP) ESTIMATION 


165 


We see that after the single observation the ideas of A and B about /3 
(represented by the posterior distributions) are much closer than before 
using the observation. Note that A did not learn as much from the 
experiment as did B. The reason is that for A the uncertainity in the 
measurement indicated by a =40 was larger than that indicated by the 
prior standard deviation, a^ = 20. In contrast, for B the uncertainty in the 
measurement was considerably smaller than that of B’s prior (a|j = 200). 
For A the greater influence on the posterior distribution is the prior 
whereas for B the measurement has greater effect. As, however, more and 
more Y, measurements are used for estimating jS, (5.4.17) and (5.4.18) 
indicate that the prior information has less and less effect upon the 
estimate and its standard deviation. 

5.4.3 Comparison of Viewpoints 

Three different types of prior information have been discussed. First, in 
Section 5.3.4 prior information from actual experiments is combined with 
that from a new set of experiments. Only maximum likelihood need be 
used and the ideas are relatively straightforward. In the MAP cases, which 
use Bayes’s theorem, the ideas are less clear and have been the subject of 
controversy. In the first case, the parameters are random, as in the case of 
the thermal conductivities of different batches of steel in the example 
above. In the second MAP case the parameters are not random but our 
prior belief can be incorporated into a subjective prior. 

For each viewpoint the form of the parameter estimators are identical. 
The only differences are in symbols and meanings of the terms for the 
prior mean and variance. In each case, the variance of Z), — ySj gives the 
same mathematical expression. 

Problem 5.21 gives the estimators for the two-parameter model (Model 
5). 


Example 5.4.1 

A scientist has measured a certain physical phenomenon and obtained the data 
given below. From knowledge of his measuring device, the variances of the 
measurements are also given. From his previous experience he feels that he can 
give a prior normal distribution with a mean of 1.01 and a variance of 0.001 for the 
parameter. 


1 0.01 0.02 0.01 

2 0.1 0.12 0.05 

3 1 0.8 0.1 

4 10 13 2 


166 


CHAPTER 5 LVTROOUCnON TO LINEAR ESTINUTION 


The regression function is ij, •» p^Xf and the assumptions regarding the data are 
y) = jj, + e,, e.— Af(0.of) £(e,^)*=0 (on^j 

F(Af() = 0, values are known Estimate using (a) OLS. (6) ML, and (c) MAP 

estimators Also find the variance of the estimate in each case 


Solution 

The assumptions given above can be designated ItOllUO Various sets of assump- 
tions are used in the different cstimalor methods 
(<j) The OLS estimator does not use any statistical assumption Using (5 2 5) the 
estimate » 

0 01(0 02)-^0 1(0 !2)+ 1(08)+ 10(13) . 

OOOOI+OOI + l + lOO =1 2950 

The calculation of the variance of *iols require some assumptions, we use 
those designated 1 10l>l )• With the nonconslant (5 2 10) is not valid for finding 
the variance Instead the reader should denve 

0 0001(00l)*+00l(005)’+l(0 0*+(l00)(4) 

; >0 0392 

(101 0101)’ 

(i) For ML estimiiion the assumptions needed ate those given above Pnot 
information is not used From (5 3 4) and (5 3 7) we find 

b O02(00n(Q0O~’-f -fl3(l0)(2)~’ 

(00I)’(O01)'’+(OI)’(OOS) ’+1^0 1)'’+10’(2) ’ 


V(f-,ML)=‘[2A'jV’] '»(130)'’=«'000769 

(c) For MAP estimation the subjective poor intormalion is included Using the 
assumptions given above permits the use of (5 4 I7b) and (5 4 1 8b) to get 


_ I193-H0l(00t) ' 
2Z/+<j'* I3Q+(001)'' 


*0 99938 


V(6,map)“[2Z/+«>1.’] '“(1130) '*000885 

For the OLS estimation no statistical assumpboas are used, this implies that no 
information is used regarding flie errors Maximum likelihood estimation uses 



5.5 MULTIPLE DATA POINTS 


167 


information regarding the measurement errors. MAP estimation uses the prior 
information regarding the parameter in addition to the information used in ML 
estimation. This suggests that the parameter variance for ML would be less than 
that of OLS and that of MAP would be the smallest. This is indeed what occurs in 
this example. However, if many additional measurements are given, the effect of 
the prior information is to reduce the disparity in values given by ML and MAP. If 
the errors do not have constant variance, the OLS values could be different from 
those given by ML and MAP even for a large number of observations. 


5.5 MULTIPLE DATA POINTS 

One way to gain insight into the assumption of the constant error variance 
(that is, af = a^) is to use repeated measurements. For Models 2, 3, and 4, 
this means to have more than one measurement of Y at each X^. For 
Model 5 repeated measurements occur for more than one F, value at each 
combination of X, 2 - Repeated measurements are not always possible to 
obtain, but whenever possible they should be obtained for each new 
problem until the nature of the dependence of a} on / is understood. 
Furthermore, multiple data points could be useful in investigating the 
validity of other assumptions such as those of zero mean, uncorrelated, 
and normal errors. 

In some cases repeated measurements can be simply obtained by in- 
vestigating another specimen at the “same” conditions. In other cases, 
repeated measurements can be obtained by using several sensors attached 
to the same specimen. An example of the latter is for temperature measure- 
ments in solids and fluids; the thermocouples (if they are used) might be 
all placed to measure the same temperature. The same could be true for 
other sensors as well. 

It is important to distinquish between repeated measurements and 
taking repeated readings of the same measurement. A failure to do so may 
lead to inefficient design of experiments and to erroneous statements 
regarding accuracy of the parameters. The difference between repeated 
measurements and those that are essentially repeated readings can be 
illustrated by an example involving the temperature history of a solid 
copper block that is initially hot and then allowed to cool in open air. 
Several thermocouples are attached to it. Because of the high thermal 
conductivity of the copper the temperature of the block is quite uniform 
throughout it at any given time. The temperature of the block gradually 
decreases with time, however. 

Consider first a given thermocouple. At any time the thermocouple 
would yield a temperature measurement which is in error owing to a 
number of different factors. Perhaps the largest factor is that due to 



168 


aiAPTER 5 ii*mioDUcnov to linear estimation 


calibration errors Over the whole calibration temperature range the 
average error is nearly zero but at most temperatures the calibration error 
IS not zero Hence jf several temperature measurements are made with only 
a short time interval between them, the “same’* calibration errors would be 
m each measurement Very nearly the same measurements would be 
obtained so that these could be considered repeated readings of the same 
measurement TTiese repeated readings may contain random components 
but the variance would be small compared to the calibration error 

A repeated measurement of the temperature at a specified time is more 
appropiately given by another thermocouple embedded m the specimen It 
too would have a calibration error but the error would be independent of 
that of the first one (provided (he calibrations are independently made for 
each sensor) 

If a measurement is taken at some later tunc when the temperature has 
dropped considerably, the calibration error in the temperature measure 
ment will be nearly independent of the early measurements for the same 
sensor 

It 1 $ also possible to obtain repealed measurements involving thermo 
couples (or other sensors) using the same sensor This would occur in the 
above example if the calibration were very good and the associated 
variance were small compared to fluctuations in the readings due to 
electronic noise For example, ti might be (hat unbiased measurements of 
the temperature of a stirred water-ice mixture would produce values ol 
on, —006, —001, 003 .005*C when the correct value is O'C The 

same type of random measurements might be produced for small or large 
time spacing between the measurements In this case the errors are random 
with zero mean These measurements can be considered repeated values 
even if the “same” specimen and sensor are used The above examples 
illustrate that it is necessary to be careful to distinguish between repeated 
measurements and repeated readings 

5.5.1 Sum of Squares 

The case of ordinary least squares is fust considered One can always 
number the observations so that we can write 

(5 5 1) 

I-l 

if there are any repeated values the estimators given in Section 52 still 
apply Some saving in effort, however, can be sometimes achieved by 
denoting the observations and the regression function There might 



5.5 MULTIPLE DATA POINTS 


169 


be m, measurements of Y at X^, m 2 measurements at Z 2 ,...,and m, at X,. 
Typically the Y values will be designated Yy for location X, withy = l. 
Then (5.5.1) can be written 

r , r 

5= 2 S where ^ mj = n (5.5.2) 

1=1 2=1 y=i 

Let us now derive another expression for S that is frequently easier to 
use than (5.5.2). It applies equally well for both linear and nonlinear cases 
and shows that minimizing S need involve only means of the Y^’s for each 
/. Consider first the identity, 


y,-V,-(Y,-Y,) + (Y,-,,) 
where Y, and another mean (to be used later) are 


Y. = — 2 Yy, 


K 


y = - 2 fn,Y, 


;=1 


Squaring and summing (5.5.3) over / and j gives 


(5.5.3) 


(5.5.4) 


S=^{Y,- vf= i:(Y,-Y,f+'Zm,(Y- r,,f 

ij i,j I 


+ 2^(Y,-Y,){Y-ri,) 

‘•J 

(5.5.5a) 

^(Y,-Y,f+^m,{Y-r,,f 

hj 1 

(5.5.5b) 


The cross-product sum in (5.5.5a) is zero because the summation on j is 
equal to zero. Note that the first summation in (5,5.5b) is not a function of 
the parameters. Hence for linear and nonlinear parameter estimation prob- 
lems with repeated measurements the same parameters will be found if we 
start with the function 


Si= X >^,{Y,-v,) (5.5.6) 

1 = 1 

rather than (5.5.2). Note that (5.5.6) requires less computation, however. If 
the measurement errors are independent, but have variances dependent 
only on i, maximum likelihood estimation (with the assumptions 11-11111) 



170 


aiAPTER 5 INTRODUCTION TO LINEAR ESTIMATIOV 


can be performed by minimizing 

( 557 ) 

{•I 

where 

( 558 ) 

When estimating parameters using repeated measurements, it is necessaiy 
that r>p where p is the number of parameters In Model 3, for example, 
estimates of and would require measurements at no less than two 
different A", values regardless how large n is 

5 5.2 Parameter Estimates 

Parameters can be estimated by mimmuing (5 57) for various models 
given in this chapter Economy tn obtaining estimators can be obtained by 
utilizing previous results Consider first Model 2 (v,"0iX,) and ML 
estimation with the assumptions H*U 1 1 1 Then b , is given by (5 3 6) with 
F, as defined by (5 5 8) and 2, by 

Z.sX.my^a,-' (5 5 9) 

The variance of b, w given by (5 3 7) with Z, defined by (5 5 9) 

For Model 5 given by estimators b^ and their 

vanances, and covariance can be obtain^ from (5 2 24) and (5 2 26) by 
letting 

X,,-*Z„ = X„myV' ‘ (55l0a,b) 

0*^1 (5 5l0c,d) 

For Model 3, f], = + PiX,, the ML results can be obtained from the 

above procedure more simply from (5 3 10 12) by replacing a} by aj/m, 
and Kj by Y^ 

The number of terms related to Y and rj has increased in^this section In 
addition to the observed vahie Y^, thwe is the value Y , which is the 
average of the Y^ values at a given Y, is the predicted regression value 
at Xi, rj, IS the actual regression _yalue at that is, by_definition E(Y,j) 
and thus the expected value of Y, and of Y, also, and Y is the weighted 
average of the Y values over all the X, values These symbols are 
illustrated by Fig 5 4 



5.5 MULTIPLE DATA POINTS 


171 



Figure 5.4 Relationships among observations, etc. for repeated measurements. 


Example 5.5.1 

Four measurements are made for both X,=0 and A ’2 = 80 with the same errors e, as 
in Example 5.2.4 except the fifth error is not used. Then the Yy measurements at 
are 0.258, 0.966, 2.453, and 1.963, whereas at A '2 = 80, Yy is 9.418, 10.792, 8.626, 
and 8.778. The assumptions of additive, zero mean, constant variance, uncorre- 
lated, normal errors, and errorless X-, are valid. There is no prior information and 
0 ^ is unknown. 

(а) Using expressions developed in this section, estimate the parameters ySp and 
i3j in Model 3. 

(б) Find the estimated standard errors of fep and b]. 


Solution 

(n)With the assumptions given, 11111011, the estimates can be obtained using OLS 
or ML. The simplest expressions to use are those given by (5.3.10,-12) by replacing 
of by a^/mi and Tj by Y^. Since a? is a constant (5.3.10) and (5.3.11) can be written 


bo=Y~biX 


ia) 


ib) 

(c) 



CHAPTERS LNTRODUCnON TO UNEAR ESTIMATION 


17J 


In the above equations r=2 A", =0. and Xj=80, also 

i 

“• + 4(80)! - 40 

y, = _L 2 r,y=4l0258+ +I 963)«I410 

•"i j-i 

yj=-j^ 2 T^,*'^!9418+ +87781=94035 

i 42 

y= j(J 4J0+ 94035)=5 40675 

Then using the expression (a) lor t, we obtain 

ii- j i f.iw,-«)j/ J i 

-lUI(4)(-4O)*»«35(4)(4O))/l4tl60m+4(l600)l 

-009991875 


(6) The expressions for the estimaied standard enors can be obtained from 
(5 3 12) by replacing e,** by m,s * to get 



Ot («) 

Since Acre ate two X, values and two parameters, the predicted line passes 
through y* and fj Then the minimum sum of squares resulting from (5 5 2) is the 
first term on the nght side of (5 5 5) 

5^,^- 2(4' ^)*“2 9179+ 2 9239 = 5 8418 
so that the estimated vanance of the etiois is 


'.S»^(«-2)»5 8418/6=0 9736 



173 


5.6 COEFFICIENT OF MULTIPLE DETERMINATION (R^) 


and the estimated standard error is 5 = 0.9867.Then using (d) and (e) 


est. s.e.(6o) = 0.9867 


6400(4)/8 

2(1600)(4) 


1/2 

= 0.4933 


est. s.e.(Z),) = 0.9867[2(1600)(4)] '''^=0.00872 

Though the value of bo is less accurate than that given in Example 5.2.4, the 
variances are smaller in this example than in Example 5.2.4. These estimated 
variances corroborate the theoretical result that smaller estimated variances are 
generally obtained for Models 3 and 4 by concentrating the measurements at the 
minimum and maximum Xj values. 


5.6 COEFFICIENT OF MULTIPLE DETERMINATION 

In this section the sum of squares are compared for two different models 
applied to the same data. Ordinary least squares is used as the estimation 
procedure. The analysis will start in sufficient generality to permit the 
models to be linear or nonlinear in the parameters. Later the results are 
specialized to Models 1 and 4. In the following discussion we consider two 
models, designated A and B. Frequently Model B has the same functional 
form and parameters as Model A except there is an additional parameter 
in Model B. Many authors restrict the meaning of to the case where 
Model A is Model 1. 

Let ^ Yj be the predicted value of L,- for Model A and g Yj for Model B. 


We start with the identity 

Y-AY, = {Y-gY,) + {gY-J,) (5.6.1) 

which can be also written as 

A^i==Bei + {BY-AYi) (5.6.2) 

for which the residuals for Models A and B are defined by 

A^i^Yi-^Y, and e^i^Y-gf., (5.6.3) 
Let us square and sum (5.6.2) over / to get 

Y‘Aef=^Bef+^BY-A Y,f + 21.ge,{gY- ^ Y,) (5.6.4a) 

2SC (5.6.4b) 


SST = SSE + 


SSR 


+ 



1'74 


CHAPTER 5 WniODCCTlON TO LINEAR ESTIMATION 


Each term m (5 6 4b) corresponds to the term in (5 6 4a) directly above 
Note that SST is the minimum sum of squares for Model A and SSE is 
the minimum sum of squares for Model B Let us specify Models A and B 
so that 


SST= > Sge/ = SSE (5 6 5) 


■which would be always true if Model A could be obtained from Model B 
by making a certain parameter in Model B equal to zero 
Divide (5 6 4) by the left side and rearrange to the form 




. SSE 

" SST 


(5 66) 


where il’ is called the coefficient of multiple determination and is defined 
by 

SST^^ 

Because of condition (5 6 5) an examination of (5 6 6) reveals that 0 < 

< I where corresponds lo both models being nearly as effective and 
corresponds to Mode! B being much better than Model A Then f!* 
can be used to say something about the improvement in the “goodness of 
fit,” /?*“0 being the poorest and I being the best improvement jn 
using Model B rather than Model A 

For nonlinear problems, the parameter estimates and sum of squares can 
be found separately for Models A and B and then would be evaluated 
using (5 6 6) For the simple hncar models given next a simplified form of 
(5 6 7) is frequently used 


A classical case considered in connection 

with IS for Models 

1 and 4 

being A and B, respectively. 



ModelA ^Y. = 0o+t., 


(5 6 8) 

Models ,r,-pi+fii(X,-X)+Ci, 

Y+b^{X,~X) 

(5 6 9) 

The term SC m (5 6 4b) and (5 6 7) u then 



SC=2ge.^i,(JT.-,f 

)]- 

(5 6 10) 


where the normal equation for Model 4 and parameter >5, was used Hence 



5.7 ANALYSIS OF VARIANCE ABOUT THE SAMPLE MEAN 


175 


can be calculated from (5.6.7) which becomes 

i,2(A-,-x)y, 

^ 

2(7,- 7) 2(7,.- 7) 2(-7,-7) 

where 7 is associated with Model A (Model 1 in this case) and 7, with 
Modd B (i.e., 4). If 7,= 7,, that is, the prediction is perfect, then 1. If 
I^.= 7, that is, 6,=0 or the model Y^Pq+e alone fits the data, R^ = 0. 
Thus is a measure of the usefulness of the term ^,(7,-7) in the 
model, it being not needed for R^fnO and needed for i?^»l. R^ as 
given by (5.6.11) is the correlation coefficient of (2.6.17). 

Example 5.6.1 

Investigate the goodness of fit as indicated by for Example 5.2.4. 

Solution 

Using (5.6.11) and values given in Example 5.2.4 gives 

, (0.101988)^(6000) 

r2= 1 ^ =0.9131 

2(7,-5.366) 

which is nearly unity, indicating that the [iy{X, — X) term may be needed in the 
model. 


5.7 ANALYSIS OF VARIANCE ABOUT THE SAMPLE MEAN 


The subject of analysis of variance is a broad one and contains many 
different facets. In this section only certain aspects of the analysis of 
variance (ANOVA) are considered. 

The preceding section employed no statistical information and thus no 
probabilistic statements could be made. This section uses many of the 
standard assumptions. Assume that the errors are additive, uncorrelated, 
and normal and have zero mean and constant variance. The value is 
unknown and there is no prior information regarding the constant parame- 
ters. The Xj values are nonstochastic (i.e., errorless). These assumptions are 
designated 11111011. 

For models 1 and 4 given by (5.6.8) and (5.6.9), equation (5.6.4a) can 
be written 

2(y,.- ff =2(7,- y;.)%2(7,.- f)' ( 5 . 7 . 1 a) 

SST = SSE -1- SSR (5.7.1b) 



176 


aiAPTER 5 inrrRODUcnoN to linear estivution 


y IS for Model A (or 1) and Y, is for Model B (or 4) The sum of squares 
on the left side of (5 7 la) is sometimes called the total sum of squares and 
designated SST The first term on the nght of (5 7 la) is called the error 
sum of squares, SSE The remaiiung term is (5 7 la), called the regression 
sum of squares, SSR U can be proved that SSE and SSR are independent 
Any sum of squares has associated with it a number called its degrees of 
freedom Let the sum of squares be written as a sum of the squares of 
independent linear forms (A linear form, for example, is S o, y, where the 
a,‘s are constants and the y/s are vanahles ) Then the number of indepen- 
dent linear forms is the number of degrees of freedom The sum of squares 
of y, “ y, for the assumptions II IIIOII is for n being the number of 
observations and p the number of independent parameters Hence SST has 
n-1 degrees of freedom and SSE has «-2 Since SSE and SSR are 
independent, we know from Cochran’s theorem (4) that the sum of the 
degrees of freedom of SSE and of SSR is equal to the degrees of freedom 
of SST This information can be used to obtain that which is displayed in 
Table 5 4 

Table 5.4 ANOVA Table for Partition of Variance About y, (5.7.1) 

Source of Sum of Degrees of Mean 

Varulton. Squares Freedom Square 




5.7 ANALYSIS OF VARUNCE ABOUT THE SAMPLE MEAN 


177 


This statistic can provide a measure of ho\^much the additional parameter 
;8, (i.e., using the model Y- = - X) + e,. rather than T,- = + e,) is 

needed. If F is near unity [corresponding to in (5.6.11)], then the 

two-parameter model (Model 4) does not significantly improve the fit 
compared to the one parameter model (Model 1). The other extreme is 
large F [which corresponds to /?^«sl in (5.6.11)]; in this case we can be 
confident that the jSj parameter is needed. 

A probability statement can be made utilizing the F statistic and a table 
of its distribution which could be used to obtain the value of F] _„(!,« — 2). 
See Section 2.8.10. The probability of F being less than F, _„(!,« — 2) is 
1-a or 

= (5.7.3) 

Alternatively we can write 

= « (5.7.4) 

In words, if the null hypothesis /fo:j8, = 0 is true, the probability that the 
calculated value F exceeds the tabulated value is a. If F is greater than 
7^i_a(l,n-p), we reject the null hypothesis at the given significance level 
a. If the calculated F value is less than F,_„(l,«— p), we say that we 
cannot reject the null hypothesis — that is, it may be that = 0. 

Example 5.7.1 

Using the data of Example 5.2.4 develop an analysis of variance table and 
determine if the parameter is needed. Make the probability 1% of falsely 
deciding that j8, is needed. 

Solution 

Using the data from Example 5.2.4 the following AN OVA table is constructed. 


Source 

Sum of 
Squares 

Degrees of 
Freedom 

Mean 

Square 

Calculated 

F 

1. Residual 

5.9377 

7 

0.92100 


2. Deviation 
between 
line and 
mean 

62.4097 

1 

62.4097 

62.4097 

0.62100 

67.763 


3. Total 68.3474 8 


From a table of the F distribution, we find 

F,_„(l,M-2) = /-p55(l,7)= 12.25 


178 


CHAPTER 5 INTRODUCTION TO LINEAR ESTIMATION 


Since f >Fo 9,(1,7) we reject the null hypothesis that II /I, is not needed, 
i^r method has only a 1% chance of causing us to use the model tj," ^ 5+ 0,(Ar,- 
A") rather than tf 

The use of the F test for model building is considered further in Chapter 

6 


5 8 ANALYSIS OF VARIANCE ABOUT THE REGRESSION LINE 
FOR MULTIPLE MEASUREMENTS AT EACH X, 


Consider the case of partitioning the vanutioit about the predicted regres 
Sion line for multiple measurements at each X, From (5 5 7), which applies 
for linear and nonlinear parameteT estimation, we have 


2 S + 2 «,(?:' 1^,) (581) 


ss, - ss. 


ss. 


Total sum of 
squares* be- 
tween data and 
regression line, 
“residuals" 
idf =n-p) 


Sum of squares 
within data sets 
<» “pure error sum of 
squares” 
(df-n-r> 


Sum of squares of 
local mean about 
^ regression lint, 
‘‘lack of fit sum 
of squares” 
(df=r-p) 


where d f stands for degrees of freedom The number of degrees of 
freedom on the left has been discussed previously it is the total number of 
points minus the number of parameters The first term on the nght has the 
contribution from 1= 1 of 


S(y,~r.) 


which has m, - 1 degrees of freedom, the second contribuiion (/ =2) would 
have Wj — 1 degrees of freedom Hence for the first term on the right hand 
side oi \'b'bY),'fne number tA degrees bSIreetom is 

dt- (582) 


•SS, IS our 


former SSE 



5.8 ANALYSIS OF VARIANCE ABOUT THE REGRESSION LINE 


179 


The number of degrees of freedom of the last term are given by subtrac- 
tion. The various terms are labeled SS„ SS^, and SS^; note that the terms 
are not completely analogous to those in (5.7.1), but are similarly labeled. 
In fact, (5.8.1) can be used in (5.7.1) to get 

SST = SSE-1-SSR 

= [SS^ + SS,] + SSR (5.8.3) 


where an additional summation is used in (5.7.1), and then 


r _ 2 

SST= 2 ' 2 { y . j ~ y ) (5.8.4) 

<=i j=i 

SSR= m,(y,-f)" (5.8.5) 

( = 1 

where Y is defined by (5.5.4). 

Table 5.5 shows the analysis of variance table for (5.8.1) in lines 2 and 
3; the table as a whole illustrates (5.8.3). 

The mean square s^, which is defined by 



(5.8.6) 


Table 5.5 ANOVA Table for Partition of Variance About 7,-, (5.8.1), 
and About Y, (5.8.3) 


Source of 

Variation 

Sum of 
Squares 

Degrees of 
Freedom 

Mean 

Square 

1 . Pure error sum 
of squares 

SS, = 22(F,-y)2 

n — r 

S'e=SS^ /(«-/•) 

2. Lack of fit sum 
of squares 

ss,=2m,(y;-y,)^ 

r-p 

s} = SS,/{r~p) 

3. Residual sum of 
squares 

ss,=22(y„-y,)2 

n~p 

s^ = SS,//{n-p) 

4. Sum of squares 
between line and 
mean 

ssR=2w,(y,-y)2 

p~\ 

SSR 


5. Sum of squares SST= 22( Y^-Yf n~l 

between data and 
mean 



CHAPTER 5 INTRODUCnON TO LINEAR ESTIMATION 


180 

IS an unbiased estimate of o* even if the true model is not used or if the 
model IS nonlinear Hence this estimate of is said to anse from ‘ pure 
error.” On the other hand, s*. 



IS not an unbiased estimate of if the model is incorrect 
5.8.1 Expected Values of for Incorrect Model 

Let us investigate the effect upon of an incorrect mathematical model 
We recall that e,^ vs the residual for the jth measurement at X,. il “contains 
all available information on the ways in which the fitted model fails to 
properly explain the observed vanation m the dependent variable y” [1, p 
26] Recalling and writing 

n- >?.-(>'<,- K)-E[y.- Y,)* E{rj- t) 

+ (5 88) 

-9^ + 5. (58 9) 

where 

«/"{(>;- (5810) 

B, IS called the bios error ilX, ills aero if the model is correct 
The random variable has a zero mean whether the model is correct or not 
since E(Yy)’=-g, is true m any case These statements regarding^, and^y 
are true for nonlinear as well as linear models 

For Model 5 with the assumptions denoted 1 1 1 1 - II (except that E ( Y,) 
5- Yj it can be shown for OLS and ML estimation that 

E(,^)--^ss[y{r,-f,)+[^~£{r,)]‘] (ssii) 

which reduces to 

(5 8 12 ) 

where (5 2 31) is used If the model is correct, the last term in (5 8 12) 
disappears 



5.8 ANALYSIS OF VARUNCE ABOUT THE REGRESSION LINE 


181 


When the model is incorrect, the residuals contain both random {q^} and 
systematic or biased components (5,) which are respectively called vari- 
ance and bias error components of the residuals. An incorrect model 
results in an inflated residual mean square. 

5.8.2 F Test with Repeated Data 

For this case of repeated observations, an F statistic is (forp = 2) 



n-r 


where numerator and denominator contain distributions if the model is 
correct; sj is called the mean square due to Jack of fit. This F^ value should 
be compared with F^_„{r-2,n - r). If — 2,n — r), we say that 

F^ is significant and we mean that the model is inadequate. An estimate of 
using Sg would be unbiased, but using or would be biased and tend 
to yield too large an estimate. If, on the other hand, 

F^ is said to be not significant; there is no reason to doubt the adequacy of 
the model and both the pure error and lack of fit mean squares ( 5 ^ and s^) 
can be used as estimates of a^. Moreover, s^ is a pooled estimate of s^. See 
Fig. 5.5 for a schematic diagram summarizing the steps for checking for 
lack of fit with repeated observations. 

The use of the F^ statistic as given by (5.8.13) does not preclude the use 
of the F statistic given by (5.7.2). They give different information. F, 
(5.7.2), can be used whether there are repeated measurements or not; it 
tells whether ;8, is needed and can be generalized to investigate the validity 
of adding another or several parameters to the model. For cases where 
there are repeated measurements, the F^ test can indicate if the model is 
satisfactory (with no reference to adding another parameter) and can tell if 
can be estimated from s^. For repeated measurements both tests should 
be used. 

With the two F tests we can have four combinations associated with (a) 
significant (or not significant) lack of fit and (Zj) significant (or not 
significant) linear regression. These combinations are illustrated in Fig. 5.6 
and the results are summarized in Table 5.6. In each case the model 

y=/So+y5,A+e = yS^+^, (A-^) + e 

is used. 



Lack of fit 









Figure 5.6. Typical straight line situations. (Adapted from Applied Regression Analysis by Norman R. Draper and Harry Smith, 
John Wiley & Sons.) 


CHAPTER 5 PiTRODCCnON TO LINEAR ESTIMATION 


IS4 


Table Summary of Obsenations from Figure 5.6 


Observation 

Case 1 

Case 2 

Case 3 

Case 4 

Significant lack 
of fit 



X 

X 

Significant linear 
regression 

X 


X 



For case 1 the linear model js adequate since there is no lack of fit and 
there is significant linear regiession_For cast 2 the linear regression is not 
significant, hence the model Y=:Y would be recommended For case 3 
there is lack of fit, but the linear regression is significant, thus one might 
try J'“/3o+/3,A’ + /3n.^*+< In case 4 ihere is a significant lack of fit and 
not significant linear regression A model such as >'-^o + j8iA'+/0nA'*+t 
would be recommended even though there is not significant linear regres- 
sion (\Vhy'’) 

Both tests need not be limited to testing the adequacy of the simple 
linear model y,“/lo+/3tAf,+<;, but can be applied to linear estimation 
with more parameters and even to nonlinear parameter estimation, this 
can be done if there are repeated observations for the standard conditions 
of zero mean, independent, constant variance, and normal errors 
After saying the above, ii should be emphasized that considerable 
insight can sometimes be gamed m unfamiliar cases if the residuals are 
plotted and inspected visually 


5 9 CONFIDENCE JNTEBVaL ABOUT THE POINTS ON THE REGRES- 
SION UNE 


Let us consider a confidence mlersal about any point on the regression 
line _ 

Y^^bo+b,(Xt-X) (591) 

This requires the vanance of 1^. which is given by (5 2 41a) Using this 
expression with a replaced by j the estimated standard error is 




est se(yj)» 


X{X,-X)‘ 


(5 9 2) 



5.10 THE STANDARD ASSUMPTION OF ZERO MEAN ERRORS 


185 



X. 


X 


X 


Figure 5.7 Confidence intervals about points on the regression line. 


which is clearly a minimum at X,^ = X and becomes larger toward the 
extremities; (5.9.2) implies that we do not know a. The confidence limits 
for 7* are , / ' ^ i 

i^.±^.-„/2(«-/>)[est.s.e.(n)] (5-9.3) 

for n observations of 7,, p parameters, and 100(1 — a) confidence. Figure 
5.7 shows the 95%, say, confidence limits for the model (5.9.1); the curved, 
hyperbolic lines about the straight regression line give the confidence 
limits. 

These limits can be interpreted as follows. Suppose that repeated sets of 
measurements of 7 are taken at the same X values as were used to find the 
confidence limits given in Fig. 5.7. Then, of all the 95% confidence 
intervals constructed for T}f. = E{Yi^) at X/^, 95% of these intervals will 
contain £(7^.). 

Confidence intervals and regions for parameters are discussed in 
Chapter 6. 


5.10 VIOLATION OF THE STANDARD ASSUMPTION OF ZERO MEAN 
ERRORS 

In the next few sections violations of the basic assumptions are considered. 
One of the easiest to treat is the case of additive errors that do not have a 



186 


CHAPTER 5 irmiODUCnON TO UNEAR ESTIAUTION 


zero mean The assumptions Ihen are lOillHl. 

We are concerned here wth nonzero mean errors that remain after any 
appropriate corrections have been made Suppose, however, after all 
known conections have been made, the errors still do not have a zero 
mean so that 


£■(0=/ (5101) 

where/?*'©. Let e, be written as two terms one of which has a zero mean, 
t,=/,+ c.. £:(t>.)=0 (5102) 

Consider several functions of / m connection with Model 2, = 

with X, not being the same for all / The first function that we consider is 
J,=‘C, constant Then V, for Model 2 can be written 

AT. +/ + o,-c + M + p, (5 105) 

where now the bias c is a parameter to be estimated in addition to 0, In 
this case a one-parameter Model 2 problem becomes a two parameter 
Model 3 problem 

If/ happens to be proportional to X, or/»cX, then instead of (5 103) 
we write 

}',«T|,-hf,-/J,X, + cX, + o,-(/J, + c)X, + c, (5 104) 

and thus it is possible to estimate only the sum 0^ + c 
Another case is when f,^cZ, is some known function which is not 
proportional to X, This reduces to a Model 5 estimation problem which 
involves two parameters 


5.11 VIOLATION OF THE STANDARD ASSUMPTION OF NORMAUTV 

If the standard assumptions excluding that of normality are valid 
(1 111011 1), ordinary least squares estimation can still be used The result- 
ing least squares estimators are unbiased and have minimum variance 
among all linear unbiased estimators, but they are not efficient A con- 
sequence of the central limit theorem is that the least squares estimators 
are consistent and asymptotically efficient almost regardless of the distrib- 
ution of the errors, however Hence when the normality assumption is not 
justified, feast squares estraiators stiH retain most of their desirable proper- 
ties 



5.11 VIOLATION OF THE STANDARD ASSUMPTION OF NORMALITY 


187 


We note that the previously used estimators of the variances of the 
parameters are unchanged. Confidence intervals and tests for significance 
given in this chapter are based on the assumption of normal errors, 
however; for small numbers of observations the intervals and tests could 
be substantially in error. Fortunately, for larger sample sizes and provided 
the distribution is not radically nonnormal, the confidence limits and tests 
of significance can be used as reasonable approximations. 

If the form of the underlying probability density of the errors is known, 
then the maximum likelihood and maximum a posteriori methods can be 
used. For example assume that all the standard assumptions apply except 
that the probability of e, is given by 

/(£,)= ^exp(-|£,|a"') (5.11.1) 

Then the ML function to minimize is 

( 5 . 11 . 2 ) 

1=1 

Unfortunately, minimizing S'ml simple as it would be for normal 

measurement errors. 


Example 5.11.1 

For Model 1, 17 , = j8o> estimate Pq for the data as given below. Assume that the 
assumptions 11110111 are valid and that/(e,) is given by (5.11.1). 

(a) y, = 0, 7^=1. 

(b) y, = o, 72=0.5, 73=1. 

(c) 7, = 0, 72 = 0.25, 73 = 0.5, 74=1. 

(d) Generalize the results. 


Solution 

(a) For the observations 7, = 0 and 72= I, 


‘yML=|l?ol+il-/Sol 


A plot of Sml versus fig shows that has a minimum between 0 and 1. In that 
range 5^,^ is equal to 1. Thus there is neither unique minimum nor parameter 


{b) For the three observations of 0, 0.5, and 1, a plot of 5 ml versus gives a 
minimum value of 5 ml also equal to 1 at b,=0.5. “ ^ 

bo^IslndVr^ ^ ^ minimum occurs between 



CHAPTER 5 INTRODUCTION TO LINEAR ESTIMATION 


{d) From \he pattern of the answers c^tatned. «t appears that there are wo 
possibilities, one is for aa even number of observations n and the other is for aa 
odd number Let the Y, values be ordered so that the smallest Y, value is T, the 
next larger value is Yj etc Then for « even the bg value is located between Y ,/2 
and Y„/ 2 *i For « odd is equal to 

Another example with other than the normal distribution is given in 
Section 4 9 in connection with Monte Carlo methods 


5 t2 VIOLATION OF THE STANDARD ASSUMPTION OF CONSTANT 
VARIANCE 

When K(ej) = o,^ vanes with i, ordinary least squares estimation does not 
yield minimum variance estimators Minimum variance estimators can be 
obtained, however, using maximum likelihood These estimators for one 
and two»pararaetef cases are given m Sections 5 2 and 5 3 
The effect upon the e$(imaior(s) can be investigated for many a} 
functions Assume that the standard assumptions (110) 1111) apply in this 
section where two possible functions are considered For illustrative pw 
poses, the one^parameier case. Model 2 which is is used The 

OLS and ML escinutors and variances are 

i.ois-liWKSAT,') ’ (5i2i) 

-I 

''(*,«L)-p2r,V*) (5122) 

In the case of the ML estimator and vanance the quantity Z,^X,/a, can 
be considered as a modified sensitivity coefficient, Z plays the same role 
as X, when OLS is used with all the standard assumptions being valid 
Before investigating some cases of nonuniform aj, some situations are 
suggested where nonuniform a} might anse Error variances lend to 
increase with the amplitude of signal (or observation) When the response 
of y vanes over several orders of tnagniiude-say, from 0001 to 100 the 
accuracy of the measunng device(s) is rarely constant For small signals 
the errors usually are even smaller, for the large signals the standard 
deviation of the errors may be the same small fraction of the signal, but the 
actual error may be many times the value of the smallest signal For 

•Tbe tstimator bg conforms lo ibe defuutioii ot the median given in Section J 1 1 



5.12 THE STANDARD ASSUMPTION OF CONSTANT VARIANCE 


189 


example, suppose the voltage of some device, such as heat flow meter, 
varies from 0.00001 to 0.1 V in a series of observations. (Another device 
having large variations in output is a thermistor, for which the electric 
resistance varies greatly with temperature.) In order to measure such a 
range, a digital voltmeter with several full scale settings could be used. One 
range might go up to 0.001 V, another range might be used for 0.001 to 
0.01 V, and so on. Then for readings near 0.001 and 0.01 V the percent 
accuracy might be the same; note that this infers a varying af that is 
approximately proportional to 


5.12.1 Variance of e, Given by — 8 )^ 0 ^ 

One possible variation of a} is of = (Vy where 8 is some quantity with 

the same units as V,. The OLS estimator is unaffected, but the variance of 
^i.oLs becomes 

ViKoLs)-- ^ (5.12.3) 

^i.ML estimator and variance becomes 

« X, n8^ 

Note that the variance of is a simple expression, but that for OLS is 
not. In order to make a comparison let X, ~ i 8 . One can derive the 
following summation expressions 


n 




/ = ! 


n{n + l)(2n+ 1) 
6 


n 




/=i 


/i(n + 1)(2/7+ l)(3u2 + 3n- 1) 
30 


which yield for the stipulated of the expression for V(b^ q^s) of 


VibuoLs) = 


6(3n^ + 3n— 1 ) 0 ^ 
5M(rt + l)(2« + l)52 


or large values of n this expression reduces to 9a^/5n8^. Hence for large 
values of n, the OLS estimate for this Model 2 case with cr^=(X,/ 8 )V has 
a variance of 6, which is 80% larger than that of 6, given by ML. This 



190 


CHAPTER 5 INTRODUCTION TO LINEAR ESTIVIATION 


means that ML estimation is substantially superior in this case to OLS 
estimation 

One further benefit of the maximum likelihood (ML) method of estrnia 
tion IS that it can be used to provide an estimate of This can be 
accomplished by replacing of in (5 3 I) and (5 3 2) by (X// S}V, differenti- 
ating (5 3 1) with respect to and then replacing by and ij, by 1, to 

get 



which IS a consistent, asymptotically efficient, and biased estimate 
S.t2.2 Variance of t, E<)ual to <r\* 

A commonly occurnng case is for the standard deviation of the error to be 
proportional to the dependent variable % In terms of the variance of e,, 
this can be expressed by 

(5 126 ) 

The OLS estimator is the same as usual but the variance can only be 
approximated For our purposes ii is permissible to replace E(y,)~v by 
y,, the regression value for OLS then let 

($127) 

In ML estimation the relation makes the problem nonlinear 

because the parameters appear in both the denominator and numerator of 
by (5 3 2) and also in the Inwf term contained m (5 3 1) A 
suggested procedure to get approximate ML values is to first solve for the 
parameter(s) using OLS and so obtain approximate values of }, qls These 
are then used to approximate of as qls the ML estimators such as 
(5 122a) 


513 VIOLATION OF STANDARD ASSUMPTION OF UNCORRELATED 
ERRORS 

In the past decade there has been widespread use of automatic digital data 
acquisition equipment in connection with dynamic experiments Transient 
temperatures have been measured for example, by using such equipment 



5.13 STANDARD ASSUMPTION OF UNCORRELATED ERRORS 


191 


to digitize the response of thermocouples. However, measurement errors 
tend to become correlated as the high sampling rate capability is used. In 
such cases the standard assumption of independent observation errors is 
not valid. 

One might also obtain correlated measurements by testing the same 
specimen using the same sensors for different ranges of the independent 
variable X,. Examples are measurements for a particular steel specimen at 
different temperatures for a property such as thermal conductivity, electric 
resistance, or hardness. 

The standard assumptions of zero mean and uncorrelated measurement 
errors given by (5.1.7) and (5.1.9) result in 

£'(e,ej.) = 0 for i=^k (5.13.1) 

When this equation is not true many descriptive terms have been used; 
these terms include colored, correlated, not independent, and dependent 
errors. Some specific types of correlated errors are called autoregressive 
(AR), moving average (MA), and autoregressive-moving average (ARMA). 
Only AR errors are considered in this section. For further discussion see 
Chapter 6 . 

Let us consider a case with additive, zero mean, autoregressive errors in 
y,. There are no errors in the W,’s. We can then write 

y, = 7 ,, + e„ £(y,j/ 3 )-T,, (5.13.2) 

The measurements errors are described by the model 

e, = P,e,_, + M„ £(m,) = 0, E{u,Uj) for i=j J 3 3 , 

^ 0 for i^j 

which is called first-order autoregressive since the error e, depends on the 
error e,_, which is for the preceding time. {Second-order errors would 
depend on two preceding times, etc.) In the following analysis the p, and aj 
values are assumed to be known. There is no prior information. The 
associated assumptions are designated 1102 - 111 . 

Rather than using the direct matrix maximum likelihood approach of 
Chapter 6 , we shall attempt to construct some sums of squares of terms 
that are uncorrelated and have constant variance. In other words a 
transformation is to be used to obtain modified measurements for which 
the assumptions 1111-111 are valid. Then write (5.13.3) at time / and i—\ 
as 


^< = '9, + P,e,-i + w, 


(5.13.4a) 

(5.13.4b) 



192 


CHAPTERS IWRODUCnON TO LINEAR ESTIMATION 


MuUiply (5 13 4b) by p, and subtract from (5 134a) to gel 

(SH5) 

Define the transformed observation F, and model H, as 

/)= (5l36a,b) 

Then analogous to (5 13 2) a transformed model is 

F,=‘H,+ u, (5 13 7) 

where the model value /) is now independent from other Fj C/#0 values 
Notice that the term k, divided by o, has a variance of unity for all t% This 
suggests that a sum of squares of independent, constant variance terms can 
be constructed from the «,/«, values or 

(S158) 

where F, and M, are given by (5 13 6) provided 

»to=0 (513 9) 

It IS important to note that (5 13 8) has been derived without restricting the 
problem to cases for which tj is linear in the parameters, hence it can be 
used for linear and nonlinear cases 

In Chapter 6 it is shown that (he function given by (5 13 8) must be 
minimized for ML estimation if. in addition to the assumptions given 
above, the errors u, are normal 

5 14 ERRORS IN INDEPENDENT AND DEPENDENT VARIABLES 

Another violation of the standard assumptions is that of the independent 
variables, designated in this chapter, being stochastic as well as f’. In 
order to present a method of solution that can be generalized to complex 
situations the method of Lagrange multipliers is introduced in this section 
For the simple example to be given it is not required but this method of 
solution is illustrated Before givmg the example, the method of Lagrange 
multipliers is presented 

5,14.1 Method of Lagrange Multipliers 

We consider the problem of finding a siaiionary (a relative maximum or 
minimum) value of the conUnuously differentiable function ,aj 



5.14 ERRORS IN INDEPENDENT AND DEPENDENT VARIABLES 


193 


that is subject to n equality constraints, 

= / = (5.14.1) 

where m>n and the are differentiable functions. Since the m variables 
n^ust satisfy n constraints, there are in effect only m-n 
independent variables. A stationary value of /(a],...,a,„) requires that 

(5.14.2) 

but the differentials da, are not independent. The constraints (5.14.1) imply 
the n differential relations 


d4>i , 9^1 

aay + - — d<3,+ 


9a 




d4>i 

+ g-<i«„ = 0 


-day + — da2 + 


8a, 


0a, 


(5.14.3) 


d(j> 


A direct method of solution can be illustrated by a simple case. Suppose 
«i = 3 so that (5.14.2) becomes 

9/ 9/ 9/ 

Let there be only one constraint so that n= 1 and then (5.14.3) gives 
0<i), 9d), d(j>i 

-r — da, + — — da^-^r — — da^^^ (5.14.4b) 

9a| 9a2 9a3 ^ ^ 

which could be solved for da^, say. This expression substituted for dhj in 
(5.14.4a) then would give 


d/'=(...)da, + (...)da 2 (5.14.4c) 

where the two different expressions in the parentheses are set equal to zero 
because the da■^ and da 2 terms can now be arbitrarily assigned. These two 
equations coming from the parentheses in (5.14.4c) plus <|), =0 would 
provide three equations for the three unknowns, a,, a 2 , and a^. 

An alternative procedure is called the Lagrange multiplier method. This 
method is introduced using the same example of «z = 3 and one constraint. 



CHAPTER 5 lOTBODCCTlON TO LINEAR ESTIMATION 


m 

Multiply (5 144b) by A, and add the results to (5 144a) Since the 
nght-hand members are zeros, there follows 




for an arbitrary value of A{ Now let A, be determined so that one of the 
parentheses m (5 14 5) vanishes Then the two differentials multiplying the 
remaining parentheses can be arbitrarily assigned and hence these two 
parentheses must also vanish Consequently we must have 

(M46.) 

(5,46b) 


9<rj ' do, 


(5146c) 


Then these three equations, (5 14 6a b.c) plus the constraint com- 
prise four equations for solving for the four unknowns o, Oj, Oj, and A, 
The quantity Ai is known as a Lagrange multiplier The introduction of 
these multipliers frequently simplifies and organizes the relevant algebra 
in minimization problems with equality constraints It is important to 
note that the conditions given by (5 14 6) arc equivalent to requiring that 
/+A, 4 ), be stationary widiout any further constraints being imposed 
Applying this observation to the more general problem given above 
suggests that 

/+A,4>,+X24>2-t- 


be extremized with respect to a, a„ Hence the following m equa 
tions must be satisfied 


Hy, ^ 


(5 14 7) 


along with the n constraints given by (5 14 !) Thus (5 14 1) and (5 14 7) 
constitute a set of m + n equations for the m + n unknowns 



5.14 ERRORS IN INDEPENDENT AND DEPENDENT VARIABLES 195 

5.14.2 Problem of Errors in the Independent and Dependent Variables 

A problem which is nonlinear even though the model is linear in the 
parameters is the estimation of the parameters in the presence of errors in 
the independent variables as well as the dependent variables. The problem 
is formulated in this subsection and the solution of a simple case is 
considered in the next. 

Consider first the dependent variable Y- which is related to the model by 

K = V/ + £y, (5.14.8) 


and thus the error Cy is additive. Also let have a zero mean, be 
independent from ey for ij^j, have a normal probability density with 
known variance terms, or 

E{£y) = 0, E{Ey£y) = 0 for i , 

E(^£y^ = ay and Ey is normal (5.14.9) 

These assumptions are designated 110111--. With this information the 
probability density of Ey^,£y^,...,£y^ is 


f{£y^,...,EyJ — 


(27r)"/^ay,---ay__ 


-exp 


y=i 




(5.14.10) 


There are also errors in the independent variables Xy which are described 
by 


= 0 except when/ and J=l, £'(€^J = a^^ (5.14.11) 

and Ex^ has a normal density. The values are assumed to be known. 
The value Xy is measured and ^y is the^true value of Xy. The errors ey and 
considered to be independent for all values of /, j, and k. 
Analogous to (5.14.10) we can write 




(277) / " ^x„ 


-exp 


1 2 ^ 
^ y=i /=i 


2 Ev fJ ^ 




(5.14.12) 


Owing to the independence of the Sy^ and Ej^^^ errors, the maximum 
ikelihood method of estimating the parameters requires that the product 



196 


CHAPTER 5 INTRODUCTION TO LINEAR ESTIMATION 


of (5 14 10) and (5 14 12) be maximized with respect to the parameters 
/Sj./Sj, ,/3^ and the .v^ii* values This is equivalent to mini- 
mizing 

SM- i (l',-'l)V’+ f S (-'■rf-ej V (5 14 13) 

l-t 

with respect to /8,, or a total of (/i-fp-fnp) parameters This will 
produce the estimates b,,i> 2 , ,bp,Yi, ,Y„.Xi,, ,X^ The and 

values are not independent, however, and must be related throu^ the 
model for which can be wnticn as ihe equality constraint 


= 0 (5 14 14 ) 

which applies for /=* 1,2, n 

The method of Lagrange multipliers involves minimizing the function 


L-is(i, {)-v 'ZhiA^ M) 


<5 14 15) 


with respect to parameters /)j, ,0^ Necessary conditions for a niinv- 

mum are 

t-5|| 




= 12. n (5 14 16b) 




The expressions in (5 1416) are evaluated at 0,-b, Tjt = 

y,, ,7),= f„, It «s 'niportant to note that 

S = S(ti,. i^) (5 14 17a) 

(5i4nb) 

Thus S IS not an explicit function of the parameters /9, ,0^ Then 



5.14 ERRORS IN INDEPENDENT AND DEPENDENT VARIABLES 


197 


(5.14.16) for any model, linear or nonlinear, can be written as 




^jp 


=0, 

k=\. 

...,p 

(5.14.18a) 

, = 0, 

i= L 

...,n 

(5.14.18b) 

A 


(5.14.18c) 

.,n; k 

= 1,... 

,p. In 

addition to the 




equations given by (5.14.18) there are the constraints g, = 0 which, for the 
linear model considered in this section, are equivalent to 


= + /=l,2,...,n (5.14.19) 

Then (5.14.18) and (5.14.19) provide p + 2n + np equations for the same 

number of unknowns which are A, Xjj X„j,. 

Consider first (5.14.18) without introducing the assumption of a model 
linear in the parameters such as (5.14.19). Then in general (5.14.18b) yields 

A, = (y,-y,)ai72 (5.14.20) 

Thus the Lagrange multipliers are weighted residuals. Introducing (5.14.20) 
into (5.14.18a, c) eliminates \ and gives 






9 &- 




= 0, k=\,...,p (5.14.21a) 

= 0, /= l,...,n;/c= l,...,p 

(5.14.21b) 


Hence, for the general nonlinear case, (5.14.21) and a set of constraint 
equations, gi = 0, can be solved for the p + n + np unknowns of 
bv---^ y,,...,A',,, ... 

Let now the linear model and its constraint, (5.14.19), be used. Then 



t9S CflAPTER S KVTRODUCnON TO LINEAR ESTIM,4T10\ ' 

(5 14 21) can be given as | 

'P (5l«2i)l 

+ l-t. .p 

(5 1422b) 

which compnsc p-^np equations for the unknowns fc,, ,b^, X^^, ,X^ 
Notice that, even though the model is linear in the parameters }^^ 
the solution of (S 14 22) is nonlinear and thus is not straightforward One 
way to start js to note^ that (5 14 22b) for fixed i provides a set of linear 
equations for X,t, yX^ which can be solved in terms of the 6, 6, 

values When the X,, values arc substituted into (5 1422a) a set of p 
nonlinear equations results for the unknowns h, The simplest case is 
fofp*l, which JS considered next 


5.14 J Model 2 (t),*^£) Example with Errors In both ij and i 

As an example of the above procedure consider a case involving model 2 
0i,y where there are errors tn both the dependent variable »j, and the 
independent vanable i, 

T.-n.+er (5 1423a) 

+ (5 1423b) 

Let the assumptions given above for Cy and ex apply except let ol and 
be the constants oj and oj, respectively 
We can obtain the solution lor b, an estimate of /S, through the use of 
(5 14 22a, b) Using first (5 14 22b) gives 

{X~ X,)a;'^-l-{Y,~bX.)b<Sr^^Q (5 14 24) 


which can be solved for X, to obtain 
. \X.+ Y,ba\ 


(Si425} 


where asoj/a^ Note that JT, is a nonlinear function of b Imroduang 



5.14 ERRORS IN INDEPENDENT AND DEPENDENT VARUBLES 
(5.14.25) into (5.14.22a) gives the nonlinear equation, 

S 


X,+ Y.ba 

Y-b— ^ 

^ \ + b^a 


X,+ Yba 
^=0 


1 + b^a 


For convenience let 


Syy = ^Y]-, S^y = ^XJj, 

Sxx = ^^^ 


199 

(5.14.26) 

(5.1 4.27a, b) 
(5.14.27c) 


and then (5.14.26) can be expanded to 

Sxy+ baSyy+ b^aSyy+ b^a^Syy= 

which can be simplified to 

{aSxy )^^ + (‘^A'x““'^yr (5.14.28) 


which in turn can 

b=- 


be solved for b, 

aS yy — Syx — [ (“-^yy ^xx 
2aSxy 




(5.14.29) 


The positive sign is chosen in the ± sign in (5.14.29) because then the 
estimate will converge to the correct value of Sxy/^xx when 0 = 4/4^ 
0. If a->oo, b approaches Syy/Sxy Equation 5.14.29 also gives b = 
Sxy/ Sxx for all values of a if it happens that Sxy/ Sxx is equal to 
Syy/SxY or in other terms, SxxSyy — Sxy = 0. In ordinary least squares 
estimation involving Model 2, we do not permit ^xx to be equal to zero. If 
^XY is equal to zero, (5.14.28) gives 6 = 0. 

After b is calculated using (5.14.29), the estimated values A, can be 
obtained from (5.14.25). Observe that a different A, is calculated from 
(5.14.25) for each i value if the T, values are different even if the X, values 
are actually the same. Physically a given A", value may not be known 
precisely, but it may be known that it is constant for several measurements. 
However, if this is the case the assumption of independent errors in each A, 
is violated. Hence another analysis is required for this special case of 
repeated Y, values at precisely the same X, value. 


Example 5.14.1 

Consider a case involving Model 2, with errors in either Y; or X, or both, that 
satisfies the assumptions given above in Sections 5.14.2 and 5.14.3. The data are 



OUPTEK 5 INIKODL’Cnov TO IXVEAB ESmUTlON 


L«t 5 b« a positive value Also mvesiigate (he case for 5-4) 
(a) Find b, X,. and Y, for og^Of 
(A) Find A and F. forajr=0 
(c) Find A and X, for oy^Q 


To find the 6 values (5 1429) can be used. Heace find Sxx- Syr. »nd Sjy fco° 
(5 1427) to be Syy^l, Syy‘=2-28+iK and 
(a) In this case a= I and <5 1429) gives 




If a-.0.A-*-l+2''*-04l42l36 
The X, values are found from (S 14 25). 

V. - 


and f, IS e<]ual to bX, Fot 5-*0 «•« obtain Jf,>05 -*02071067 and 

J207I067, Fi— 02 For «r »«»••» I (be sum 5 given b> (5 14 I3>is precisel> 1 
(A) This is the usual least squares case and b’^Sxy/Syx >4 equal to 5/2 The f, 
values are ti^~8/l and fj=5/2- For 5=0 the values are zero hence the 
predicted line IS F, = 0 Again the miomium 5 for 5 = 0 u 2 

(e) For this case A=5^^/5_jy=(2— 25 + 5')/5 The X, values arc found from 


For 5-*0, A-*x and jf| = jfj=0 Unlike part (b) the predicted line is now the 
vertical aus for X,=0 for all i The minimum S is again 2 

It IS instructive to examine the predicted Lnes for each of the cases above See 
Fig 58for5-»0'^ Notice that the nsualfeast squares case(a=^has the predicted 
bneof T=0, the F| = I, A'i = — I obsenation is replaced with F,=0and .tr,= - 1. 
and ^2 = 1, A2 = 1 is replaced with Xy^l The case for «— *» has the 

vertical predicted hne of T=0, the two observations arc replaced by the single 
pojBt Yf= 7j = J wih 2'=0 For the <»=J rase, the predicted line is inclined as 
shown It IS thus clear that the three a values can yield quite different predicted 
values la other words it can make a large ddlerence in the predicted line whether 
the errors are in T or AT or both This case shown in Fig. 5 8 is an extreme one. 
however, because many times the predicted Imes are quite close 



5.14 ERRORS IN INDEPENDENT AND DEPENDENT VARIABLES 


201 



Figure 5.8 Predicted lines for errors in dependent and independent variables for data points 
(-1,1) and (1, 1) for Example 5.14.1. 


A case for which the predicted lines are much closer together is when 5=1 
causing Ti = 0 with Z,= -l as shown in Fig. 5.9. If 5 were equal to 2 so that 
y, = - 1, then for any a such that 0 < a < 1 the predicted lines are all the same. 

Example 5.14.2 

Near a wall over which a turbulent fluid is flowing, the velocity is a linear function 
of position. Let the velocity (in cm/sec) be designated u and the distance from the 
wall (in cm) be designated x. The below data were taken from Fig. 6.20 of Kreith 



Figure 5.9 Predicted lines for errors in dependent and independent variables for data points 
(- 1,0) and 1, 1) for Example 5.14.1. 




202 


CHAPTER 5 INTRODUCTION TO LINEAR ESTIMATION 


[5] Estimated values of and oy are also given 


j X (cm) ojf (cm) 

1 bom 00003 

2 0 0162 0 0003 

3 0 0215 0 0003 

4 0 0310 0 0003 


U (cm/sec) 

ib" 

125 

165 

235 


Ofj (cm/sec) 

3 

3 

3 


The models for the true velocity u and the tnie distance x aic 
y-M+fy, «=/?* 


where U and X are measured values and ^ is a parameter which is proportional tc 
the shear stress at the wall Estimates for ^ are to be obtained using 

(а) the above information 

(б) 0^ = 0 and ffy IS unknown, and 

(e) Ox 13 unknown and Oy«0 Abo calculate the U and X values for each case 
The assumptions indicated in Section 5 14 2 are valid 

Solution 

In each of the cases u and V are analogous to ij and V. and x and X to ( and X in 
the natation given in this section Wnh this in mind let us then evaluate Syy, Sxx 
Sxy, and ct in (5 14 29) 

S UX80)*+(125)* + (I65)*+(23S)*-10447S 
Sxx^- 2 X/ = (00n2)*+(0OI62)*+(OO215)*+(O031)^ 


=000181113 


Sxr= 2 X,y=00112(80)+00l62(l25)+0 0215(165) + 003l(235) 



5.14 ERRORS IN INDEPENDENT AND DEPENDENT VARIABLES 


203 


(a) For the above values the parameter /3 is estimated using (5.14.29) to be 
6 = 7594.7438/sec. The values of X, are obtained from (5.14.25) to be 


. X + U,ba A', + 7.5947438 X 

X,- 5 

1 + b^a 

1.5768013 

After values are calculated for X„ the values for U, are found using 


U, = bX, = 7594.7438X, 

The resulting values are given 

below. 


i 


u. 

1 

0.01096 

83.21 

2 

0.01629 

123.75 

3 

0.02158 

163.91 

4 

0.03098 

235.28 

<b) This is the usual least squares analysis for which = = 7593.8778. 

The predicted or regression 

line is now U, 

, — bX,. The values for X, and U, are 

tabulated next. 

i 

X. 


1 

0.0112 

85.05 

2 

0.0162 

123.02 

3 

0.0215 

163.27 

4 

0.0310 

235.41 

(e) In this case the role 

of X and Y j 

ire interchanged in the least squares 

analysis. Here b = Syy/ S xy = 

= 7596.2482; X, 

is obtained from X,= Ujb\ and U, is 

the measured value. The results of the calculations are as follows: 

( 

X, 

u, 

1 

0.01053 

80 

2 

0.01646 

125 

3 

0.02172 

165 

4 

0.03094 

235 


A comparison of the b values in this example reveals that there are some 
differences but they are very small-the largest difference in the b values is 0.03%. 
This case is more common than that shown in Figs. 5.8 and 5.9 where the predicted 
lines are quite different. Because the curves are so similar in this example, only the 
lower two points are shown in Fig. 5.10. There are negligible differences in the 
curves that can be drawn between the three sets of predicted points. 



204 


CHAPTER 5 I^f^lODCC^ON TO LINEAR ESTIMATION 



X, ana Xj (cm) 


Figure 5 10 Predicted line for etrors in dependenl and independent vanables for Eiiample 
5 142 


REFERENCES 

1 Draper N R and Sttuth H Applied Rrgressum Amfyra fohn Wtley and Sons Inc New 
York 1968 

2 Burmgton R S and May D C Handbook of froMjiUty and Stausucs wiih Tables 2nd 
ed McGraw Hill Book Company New York 1970 

3 Box G E P andTiao G C Bi^eston injemice in Staiisiical Analysis Addison Wesley 
PubUshing Co Reading. Mass 1973 

4 Brownlee K A Slatisrical Theoty and Methodology m Science and Engineering 2nd ed 
Jofan Wiley mod Sons Inc New York 1965 

5 Kreiih F Principles of Heal Transfer 3td ed Intext Educational Publishers. New York 

1973 




PROBLEMS 


205 


PROBLEMS 

5.1 Prove using the standard assumptions that 

F(y,) = a^ E{Y,Yj) = E{Y.)E(,Yj) 

and indicate which assumptions are used. 

5.2 Show that 

, I ^ J ' ' 

and use to show that (5.2.37) follows from (5.2.36). 

53 What IS the expected value of e, for OLS estimation when the following 
assumptions apply? 

(a) y, = £(y,ty3) + £, = 7,, + e, = i3X, + £, 

(b) £(£,) = u^0 

(c) F(;f,) = 0 

(d) K(/3) = 0 

5.4 Prove (5.2.43) for Y,j = + /?, X, + when using OLS. 

5.5 For Model 5 what weighting functions for maximum likelihood estimation 
would cause the sum of the residuals to be equal to zero? Assume that the 
assumptions designated 11011111 apply. 

5.6 Show that the minimum value of S for Y, = Pq+ PjX, + e, is 

I 

5.7 The following data are given 

/ 12 3 

1 2 3 

y, 2 1 3 

Assume that the standard assumptions apply. 

(a) Find estimates of the parameters in Y, = Pq+ P^X, + f.,. 

Answer. 0, 1. 

{b) Find estimates of the parameters in y, = )3o+jS,(A',-Z) + e,. 

Answer. 3, 1. 

(c) Give the residuals e,. (Do they add up to zero?) 

(d) Estimate the variance of t,. 


4 5 

4 5 

3 6 



206 


CHAPTER 5 INTHODCCTION TO LINEAR ESTIVUTION 


Answer 1 333 

(e) Give the estimated standard error of fco 
Answer. 1211 

(/) Give the estimated standard error of bg 
Answer 0 516 

{g) Give the estimated standard error of A) 

Answer. 0365 

{}i) Give lihe estimated covananct of bo arid b, 
Answer —04 

(i) Give the estimated covanance of and dj 
Answer. 0 

5 8 The following data are given 

I 12 3 4 5 6 7 

X, 0 0 0 0 0 10 10 

Y, no 95 90 100 105 40 50 


8 9 10 

10 10 10 

55 45 60 


Assume that the standard assumptions apply Answer the same questions as 
in Problem 5 7 

59 The following values have been reported for a certain set of experiments 

I 1 2 3 4 5 6 7 

X, 40 50 60 70 80 90 100 

Y, 0J25 0J32 0 340 0 347 0 353 0 359 0 364 

Assume that the standard assumptions apply 

(j) Estimate to and for the model + 

Answer. 0 2997,0 000657 


(6) Estimate variances for bo nnd b| 
Answer 2 38X lO"*, 4 49x10"’® 


(c) Calculate e, and plot 

(d) Are the residuals correlated’ 

(e) Based on the conclusions of (d), are the estimates given in (b) valid’ 

(f) How could the model be improved? 

5 10 A study was made on the effect of temperature on the yield of a chemical 
process The following data were collected with X linearly related to temper 



PROBLEMS 


207 


ature and Y to the yield; 

X, -5 -4 -3 -2 -10 1 2 3 4 5 

Y, 0 4 3 6 9 7 8 12 13 12 17 

Assume that the standard assumptions apply. 

(fl) For the model y; = y3o+)S,A', + e, estimate po and What is the 
prediction equation? 

{b) Construct an analysis of variance table. Let the null hypothesis be that 
Pi = 0 with a risk 0.05. 

(c) What are the 95% confidence limits for jS,? 

{d) What are the confidence limits about tj, at A" = 3? 

(e) Are there any indications that another model should be tried? 

5.11 Consider the model 

y,=j8+€, 

where the standard assumptions apply for c,. 

(a) Derive an unbiased, minimum variance estimator for yS. 

(b) Give an unbiased estimate of the variance of e, (o^ is unknown). 

5.12 Repeat Problem 5.11 for the model 

Y=liX,^ + €, 

5.13 Repeat Problem 5.11 for the model 

Y, = psmX, + e, 

5.14 Use the e, column of Table 5.1 as data (that is, Y^ = —0.742, Y 2 = —0.034, 
etc.) and use the model of Problem 5.1 1. 

(a) Estimate /?. 

(b) Estimate a. 

5.15 Repeat Example 5.2.4 with the e, values replaced with nine consecutive 
values of a column of Table XXIII of reference 2. The column is to be the 
one corresponding to your birth date and the first value used in the column 
is to correspond to the birthday month. For example, if your birthday is 
March 14, then pick the fourteenth column and start with the third entry 
since March is the third month. 

5.16 The temperature of a fluid flowing over a plate is nearly linear near the 
plate. Let Y be proportional to the temperature and X be the distance from 
the wall. The following results are obtained: 

X = 0.05, 2(X,~X)^ = 0.016, 7=300, 

2(A',-.f)(7,-7) = 80, 2(7,- F)'=8320, 2( 7, - 7 )'=8100 



208 


aUPTER 5 INTRODUCTION TO LINEAR ESTIMATION 


Assume thal the model 1/= A>+/J|A’(+^ and the standard assumptions 
apply 

(а) Estimate /Sg and 

(б) Prepare the analysts of variance table 

5.17 Show that — is maximized for n even with ~R<Xi<R by 

choosing one half of the X/ (o be — ft and the other half to be at R 

5.18 Denvc an expression for cov(T„f,). thus proving (5 2 29) 

5.19 Derive the expression for y(b,~Pt) given in (5 4 12) 

5.20 Modify the analysis given m Section 5 5 to obtain estimates for /3, and m 
Model S when maximum likelihood estimation is used for 

and whatever other standard assumptions arc needed 
S21 Consider MAP esumaiion lor a random parameter lor Model 5 Let Che 
standard assumpuons implied by 1 101 1 1 10 be valid All the measuremenu 
are taken from the same batch The random parameters and have the 
joint density 


The quantities Vf. Vj and &g must be greater than zero 
(a) Derive for i = I and y=-2 or i=2 andy“l 

b, = ^+ , A=hlll22l-Il2) 

[llIsC,|+I',d^', l22]sCjj+V,&g\ (12]=C,2-K„AJ' 


= 2 M Ai - M2Z,2 F.s- 


{b) It can be shown thal 




PROBLEMS 


209 


and 


[ 12 ] 

COv(6,-^l,fc2-^2)= ^ 


Show that by - can be put in the form 

liyAy + , 
bi~pi = + C 

where 

F2[22]+F,2[12] Ki2[22]+F,[12] 

'<< 1 , • r, 

B, = (Z„ [ 22 ] - Z ,2 [ 12])a ~ c is not a random variable 

Derive the given expression for F(Z)i — /S,). 

(c) The expressions given in (a) and (b) can also be applied for the case of 
subjective prior information. Reinterpret the meaning of /S,, ^ 2 , Vy, Fj, 
^ 12 . * 1 . ^ 2 . ^(^i-Z^i). and cov^by- Py,b 2 - Pi) for this case. 

5.22 Before measuring the thermal conductivity of a particular steel alloy, a 
research engineer has developed from experience knowledge relative to 
values for steel alloys in general. The thermal conductivity over a limited 
range of temperature can be described by the regression model i), = |8i + P 2 X, 
where X, is temperature in °C. This prior information regarding /iy and ^2 
can be described by f{py,P 2 ) given by that in Problem 5.21 with /ii = 38, 
ju2=“0,01, Fi = 2, F2=10~®, and Fi 2 = —0.001. Assume that the standard 
assumptions designated 1101 1 110 apply. Using the results of Problem 5.21 
find by, b 2 , V{by- Py), V{b2- P 2 )’ co\{by- Py,b 2 - P 2 ) for the following 
data: 


i 

X, (=0 

Y, (W/m-°C) 

<r, 

1 

100 

36.3 

0.2 

2 

200 

36.3 

0.3 

3 

300 

34.6 

0.5 

4 

400 

32.9 

0.7 

5 

600 

31.2 

1.0 


5.23 Utilizing (5.13.8) derive for Model 2 { 7 ], = PyX,) the following estimator for 
first-order autoregressive errors 


by = 


ZZ.F, 
A ’ 


i^=zz]- 


Z,-(A',-p,A',_,)a, 1 = 1,2, 

1 = 1,2,. ..,17 

and where Zo=0 and Fo=0. 



210 


CHAPTER 5 INTRODUCTION TO LINEAR ESTIVUTTON 


5.24 Simplify the results of Problem 5 23 for the case of Model 1 Show that 


fco" 


2 

A 


F(Ao)=A-‘. A=tf,-^+2(l-p,)V^ 

1 

5.25 For the case of first-order autoregressive errors show that the variance of e, 
given by (5 13.3) is the eomionf value 


when 


e?-<jJ(l -p’)'*, o’-ei for »-2, ,n 

5.26 (d) Using the results of Problem 5 25 show that A of Problem S 24 can be 
written as 


F(5o)' 


* 2p+n(l -p) 


(fc) Suppose that as n becomes larger the measutements become more 
correlated as indicated by the e*pressionp“exp(-a/n) where a ts some 
positive constant characteristic of the errors Show that 

^Im ■=<«,* 

for fixed What s$ the physical significance of this result'’ 

(e) Modify the result of past (a) for f«ed and p and large n Whai is the 
physical significance of this result’ 

5.27 The following are actual data obtained for the thermal conductivity k of 
Pyrex. The temperature T (in K) is related to the voltage (in mVs) by 
7’»301 6+1824 V 



PROBLEMS 


211 


Test V, (mV) k, (W/m-K) Test V, 

1 oi UTS H 6Jn 1.129 

2 8.35 1.133 12 5.96 1.133 

3 7.97 1.148 13 5.75 1.101 

4 7.66 1.159 14 5.51 1.101 

5 7.38 1.148 15 5.34 1.091 

6 7.10 1.136 16 5.10 1.087 

7 6.86 1.144 17 4.77 1.084 

8 6.64 1.136 18 4.52 1.087 

9 6.44 1.133 19 4.19 1.080 

10 6.27 1.136 20 3.87 1.058 


For the model A:, = /3o + )3,r, find the estimates and 6,. Also find est. 
s.e.(6o), est. s.e.(6,), s, and e,. Use the computer, 

5.28 Modify the program for Problem 5.27 for variable af. Let of be a table of 
input values. In particular, let a, ^ = .01 V, and obtain new and b^ values for 
the data of Problem 5.27. 

5.29 The Moody chart provides the following data for the friction factor a 

function of the Reynolds number Re for a roughness ratio e/ Z) = 0.0001. Fit 
these data' to an equation of the form = using the linear OLS 

method with c set equal to 0.0118. (Notice that is approaching a 
constant for large values of Re.) 


Re 

/dw 

Re 

/dw 

5X10^ 

0.0370 

5X10* 

0.0123 

lx 10'’ 

0.0310 

IXlO'' 

0.0121 

5X10“ 

0.0214 

5X10’ 

0.0120 

IxlO^ 

0.0180 

1X10« 

0.0120 

5X10^ 

0.0145 



lx 10* 

0.0135 



1os(/dw~ 

c) be the dependent variable and log Re 

be the independent 


variable. Calculate also the residuals in terms of /dw^/dw the relative 
residuals, (/dw~/dw)//dw. 

530 The United States draft lottery issued in March 1975 gave the call order for 
the standby draft for men born in 1956. Results for birthday months of April 
and September are given below. 



aiAPTER 5 I^mlODUC^ON TO LINEAR ESTIMATION 


lU 




1 no 

94)25 

17 264 

25 no 

2 228 

IQ 147 

(8 134 

26-053 

3 008 

11 031 

19 036 

27 277 

4 340 

12 133 

20-359 

28 050 

3 005 

13 20S 

21 183 

29 105 

6 092 

144)47 

22 101 

30 343 

7 303 

15-093 

23 280 


a 180 

16 13) 

244)80 



September 


1 175 

9 349 

17 307 

25 2M 

2 263 

!0 347 

18-0(9 

26-231 

3 087 

II 173 

19-041 

27 022 

4 199 

12 161 

20-230 

28 102 

5 236 

13 325 

21-086 

29-089 

6 221 

14-343 

22 128 

30 064 

7 322 

15 135 

23 156 


8 341 

16117 

24 227 



Use (he mo<iel 

(o) Which sian<iaf<i assumplions ate valid’ 

(6} Esumau (1^ latng OtS using (he Apnl data 
(c) Estimate using the Apnl data 
5J1 Repeat Problem S3Qb and e using (he September data 
SJ2 Using the Apnl data in Problem 5 30 estimate and P in the model 
17 —/?(,+ ySj A" using OLS Also estimate their standard errors 



CHAPTER 


Matrix analysis 

FOR LINEAR 

PARAMETER ESTIMATION 


6.1 INTRODUCTION TO MATRIX NOTATION AND OPERATIONS 

The extension of parameter estimation to more than two parameters is 
effectively accomplished through the use of matrices. The notation be- 
comes more compact, facilitating manipulations, encouraging further in- 
sights, and permitting greater generality. This chapter develops matrix 
methods for linear parameter estimation and Chapter 7 considers the 
nonlinear case. 

Linear estimation requires that the model be linear in the parameters. 
For linear maximum likelihood estimation, it is also necessary that the 
independent variables be errorless and that the covariances of the measure- 
ment errors be known to within the same multiplicative constant. 

Before discussing various estimation procedures, this section presents 
various properties of matrices and matrix calculus that are used in both 
linear and nonlinear parameter estimation. 


6.1.1 Elementary Matrix Operations 

A matrix Y consisting of a single column is called a column vector. We use 


213 



«4 aUPTER 6 MATRIX ANALYMS FOR UNEAR PARAMETER ESTIMATION 
boldface letters to designate matnces In display form Y is written as 

r, 

yi 

Y= (611) 

Y„ 


Square brackets are used to enclose the elements of a matrix This matrix 
has n elements and is sometimes called an nx 1 matrix, that i$. n rows and 
one column 

An m X n rectartgu/ar matrix A ts given by 




(6 1J) 


In the notation for an element, the first subscnpt refers to the row and 
the second to the column You may may find it helpful in Tnemorizini this 
order by mentally visualirmg I-* with subscripts ij 
If m>=n m (61 2), the niainx A is said to be square If the matrix is 
square and also — for all / andy, the mainx is termed symmetric 

6. 1 1. 1 Product of Matnces 

The product of A times B, where A is an m X n matrix and B is n x s, is an 
m X s matrix C, 




(6 1 3a) 


Note that it is meaningful to form the product A limes B only when the 
number ot co’iutuns in A b equal to "ftie num’oer oi rows m B A square 
matrix A is said to be idempoient if AA=A 
The triple matrix product of ABQ where A is an m x n matrix, B is « X s, 



6.1 INTRODUCTION TO MATRIX NOTATION AND OPERATIONS 
and C is 5 Xr, is found using (6.1.3a) to be 


215 


ABC=[fl,,][6,,.][9/] = [^/] = 


n s 

S 2 ^ik^kfji 


k=i j=\ 


= D 


(6.1.3b) 


where D is an m X r matrix. 

Example 6.1.1 

For the matrix A being 3x2 and B being 2x2, find the product AB. 

Solution 

Using (6.1.3a) we get 



■flu fli2 

[ h /l 1 


fl|lZ>l, + fl|2^2l ^11^12'h «12^22 

AB = 

fl2i fl22 

011 ^12 

1 h h 


‘*21^11 ^22^21 ‘*21^12'*' ®22^22 


«31 ^32 

[ ^21 ^22 


‘^31^11 ■*■^32^21 ^31^12 + ^32^22, 


6. 1.1. 2 Transpose of Matrix 
The transpose of an m X n matrix A is given by 


A^= 


^21 


^2n 


-‘ml 


(6.1.4) 


If A is a symmetric matrix, A^=A. The transpose of the product AB is 
equal to 


(AB)^=B^A^ (6.1.5a) 

which can be extended to the product of any finite number of matrices, 

(A---K)^=K^--- A^ (6.1.5b) 


6.1.1.3 Inverse, Determinant, and Eigenvalues 

For a square matrix, A, the notation A~’ indicates the inverse of A. A 
nonsingular matrix is a square matrix whose determinant is not zero. 



216 CHAPTER 6 NUTRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


(Examples of determinants arc given below ) It can be shown that if and 
only if the matrix A is nonsingular it possesses an inverse A"' that is 

A"'A=AA-' = I (6 16) 

where I is a diagonal matrix and is called the unit or ideniity matnx I is 
given by 


1 0 0 
0 1 0 
0 0 1 


= diag[l 1 1] (617) 


Notice the notation diag in (6 I 7) which indicates the elements of a 
diagonal matrix 
Let ^ be a diagonal matrix 


^-diag[«„ 4>ii 
Its inverse and determinant arc given by 

^"'«diag[^i,' '^M*] 

I^1 = ^ii4>22 *{>». 


(618) 


(6 I 9a) 
(6 1 9b) 


Evidently diagonal matrices have some very convenient mathematical 
properties 

A 2X2 matrix A and its inverse arc 


A 


'=.-Lf "22 


(6 1 10a) 


where |A| is the determinant of A 

[A(=il||fl2j-fliiaji 


(6 I 10b) 



6.1 INTRODUCTION TO MATRIX NOTATION AND OPERATIONS 


217 


The inverse of the symmetric 3x3 matrix A is 


-1 



On 

a, 2 

0,3 






A-' = 

^12 

022 

^23 






1 

< 3,3 

^23 

^33 








^22^33 

-«23 

^13^23 

^12^33 

^12^23 ~ 

^13^22 


= 

|A| 



^11^33“ 


^12^13 ~ 

^11^23 

2 

(6.1.11a) 



^ symmetric 



'^11^22~ 

"12 J 


|A| = 

a,, a 

22^33 ■l'2a,2ai3<323 (^ii 

0^3 4- 022 


(6.1.11b) 


Clearly nondiagonal matrices are much more difficult to invert than 
diagonal matrices. For the inverses to exist the determinants given by 
(6.1.9b), (6.1.10b), and (6.1.11b) cannot be equal to zero. For a method for 
evaluating higher-order determinants, see Hildebrand [1]. See Problem 6.4 
for a method for finding inverses of larger matrices. 

The determinant of the product of two square matrices is 

|AB1 = 1A||B1 (6.1.12a) 

If (6.1.12a) is applied to (6.1.6), the reciprocal of |A~'| is found to be equal 
to |A|, 

^ = |A1 (6.1.12b) 

Another convenient relation involving products of two square « X « 
matrices A and B is 


(AB)"' = B-‘A-‘ (6.1.13) 

which is a relation similar to (6.1.5) for transposes. 

For any square matrix A it is also true that 

(A-')^=(A^)-' (6.1.14) 

or in words, the transpose of an inverse is equal to the inverse of the 
transpose. 

An eigenvalue (also called characteristic value or latent root) A, of a 



218 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 
square matrix A is obtained by solving the equation 

)A-XH=*0 (6115} 

There are n eigenvalues if A is nX« (counting repealed values) 

6114 ParttUoned Matrix 
Let B be an R X n matrix that is partitioned as /oHows 


(6 1 16a) 


where B has size nX«^ij=I2 and where n,-fnj=n Suppose that 
1B1=?*'0 1 Bvj 1‘?*'0 andlBjils^O SctA=B ' and partition A as 


(6 1 16b) 


- '-i*: ti 

where A^ has size n xn^ for ij* I 2 The components A^ are 

'.B„' + B„'B„A„B;,B|,' ( 6117 ,) 
A„=-B„'B,i(B„-B.,B,,'B,!l '--B„'B„A„ (6 U7b) 

A;;-'tBj,-B„B„'B„] '-B,,' + B,;'B„A„B„B„' (6 I ITc) 
B„'Bj,A„ (6 I m) 


All “ ~B22 'B;[[B|,-B,iB„'Bi 
The determinant of B is 






(6 1 17e) 


If B ,2 and Bj, are nuff matrices (only zero components) then the inverse of 
B IS more simply given by 

B ‘=diag[B„'Bjj'] (6 1170 


61 1 5 Positive Definite Matrices 

Necessary and sufficient conditions for the square symmetric matru A to 



6.1 INTRODUCTION TO MATRIX NOTATION AND OPERATIONS 


219 


be positive definite are that the inequalities, 


|a]i|>0. 


Ml 

^12 


^12 

^22 


> 0 , 


11 

^12 

Ul3 



12 

^22 

^23 

V 

o 

etc. (6.1.18) 

13 

^23 

^33 




be satisfied. A is positive semidejinite, if the strict inequalities of (6.1.18) are 
replaced by > signs. Negative definite and negative semidefinite matrices 
are defined by reversing the sense of the inequalities. 


6.1.1.6 Trace 

The trace of a square matrix A is the sum of the diagonal elements and is 
also equal to the sum of the eigenvalues X,-, 

tr(A)=i«,= ix, (6.1.19) 

;=1 1=1 

It can also be shown that the determinant of A is equal to the product of 
the eigenvalues, 

n 

|Al=nX/ (6.1.20) 

i=I 


6.1.2 Matrix Calculus 

In parameter estimation the first derivative with respect to the elements of 
a p vector of parameters, /3, is frequently needed. Let us define the 
operator by 





( 6 . 1 . 21 ) 


Because is p x 1 it must be apphed to the transpose of a column vector 
or to a row vector. Let C be an n X 1 column vector which is a function of 



220 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


Then we obtain 



■ 3 


dC| 

3cj 



3/i, 



a^i 

Wi 


3 


Be, 

3ci 

dc„ 


3/!, 



^P, 

w. 


which IS a p X n matrix and whose determinant is called the Jacobian 
Many times the matrix denvaiivc i$ applied to a scalar Consider 
operating upon the matrix product AB where A is I x n and B is n x 1 , then 




The sizes of the matnees are indicated by the superscripts in brackets One 
application of (6 1 2J) is for A-C^ B-JJ and C apxl column vector 
whose elements are not functions of (he s Using (6 I 23) gives 




(6 I 24) 


since 


V^p’-=l (6U5) 

Another important application of the matrix derivative is operating 
on the product AB where A is a I X n matrix that is a function of ^ and B 
IS an n X ffj matrix which is not a function of P In the same manner that 
(6 1 23) was derived we obtain 

, 25a) 

If As=p^ and B has the size pXm (6 I 26a) then yields 


^t^xll ^rtl Kpl^pX»lI_ 


(6 1 26b) 





6.1 INTRODUCTION TO MATRIX NOTATION AND OPERATIONS 


221 


6.13 Quadratic Form 

In estimation sum of squares functions are frequently encountered. These 
functions can be considered to be quadratic forms Q, 

g = 2 2 a,,j>,jaj (6.1.27a) 

;=i j=i 

= +a^^„„+2a,fl2^,2 + 2fif,a30,3^ (6.1.27b) 

where $ is a symmetric positive semidefinite matrix. Q is termed positive 
definite if the conditions given in (6.1.18) are satisfied by O [1, pp. 48-52]; 
for these conditions Q is always nonnegative for all real values of the 
variable a,. In some analytical considerations the equivalent expression for 

Q, 

0 = tr(OAA^) (6.1.28) 

is more convenient than (6.1.27). 

Applying the operator to Q defined by (6. 1 .27), assuming that A is a 
function of /3 and <5 is not, gives 


= V^(A^4»A) = (V^A"')^A+ [V^(A^«I>)]A (6.1.29) 


V^e = 2(V^A^)a>A 

(6.1.30) 

where (6.1.23) and (6.1.26a) are used. 

In linear estimation, A in the quadratic form is given by 


X iJ Xpl f/j X iJ 

(6.1.31) 

where X is independent of JS. Then substituting (6.1.31) into (6.1.30) gives 

V^Q = 2(V^P^X^)OX^ 

(6.1.32) 

V^Q = 2X^^XJ3 ; Q={Xpf^X^ 

(6.1.33) 


where (6.1.26b) is used. 





222 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


6.1.4 Expected Value ol Matrix and Variance-Covariance Matrix 
6.I.4.I Expected Value Matrix 
The integral of a matrix is given hy 

(6134) 

This result can be used to find the expected value of a column vector 
Yl-xi]^ 



£(Y,) 


lr,f(r,)ay, 

£(Y '"X")- 

£(yn) 

■ 

frj(r,)dr. 


where /(T,) is the probability density of Y, 

6^1 4 2 Vartance-Covanance Matrix 
An important application of the expected value of a matnx is the vanan 
ce-covanance matrix of a random column vector Y which is 

c„v(Y'-")-£{[V-ir(V)][Y-£'(V)l’') 


\ y,-£(y,) 


£• [Y,-£(Y,) 

Y.-£(Y.)1 

y„-£:{y,) 


cov(y,.r,) cov(Y, Yj) 

«ov(y„yj' 

cQv(yj Yj) cov(y2,y2) 

cov(y2.y„) 

cov(yj,y„) cov(y2,y,) 

cov(y„yj 


Nonce that this is a symmetric nX/i matrix 


6.1 INTRODUCTION TO MATRIX NOTATION AND OPERATIONS 


223 


An important special case in connection with the covariance matrix 
(sometimes this term is used rather than variance-covariance) is for Y 
containing additive zero mean errors e, or 

Y =17 + £; £'(£) = 0 and cov(tj) = 0 (6.1.37) 


Let \l/ be defined as the covariance matrix of Y. For the conditions given in 
(6.1.37), 4/ is also equal to cov(£) or 


where xj/ is in detail 





;cov(Y) = cov(£) 


(6.1.38a) 

£(e?) 

E (£,£2) • • • 

Ei£i£„) 


E (eye„) 

E (fjeJ • • • 




a] E{E^e 2 ) 


symmetric 


(6.1.38b) 


since = E{e}). In many estimation methods, given by (6.1.38b) 

plays an important role. 


6.1.4.3 Covariance of Linear Combination of Vector Random Variables 
Let be linear in the random vector or 


_ QlmXnlgfnX l] 


(6.1.39) 


The matrix G is not a random matrix. It can be proved that the covariance 
matrix of z is given by 


cov(z) = E (zz^) = £[ Gee ^G^] = G;|<G^ 

where ;|' = cov(£). 

A derivation of (6.1.40) is given next. Let 


(6.1.40) 


224 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


and then E(a^) can be written as 

f 1 «) 

where »,/ = 1,2, ,n and where we used 

C-[£,]. Gf-[gJ (6 143) 

and (6 I 3b) Using the linearity property of the expected value operator, 
£‘(ae, + 4ej) = fl£(e,)+4£(£j) (6 144) 

permits (6 1 42) to be written as 

E {a-') - [ 2 S (W *.] - GE (H)G'' (6 145) 

Since V* ■ £ (H), we have proved (6 1 40) 

A more general form of (6 I 40) which can also be proved is 

cov( 2 )-»Ccov(e)G’‘ (6 1 46) 


where the expected values of the components of r need not be zero [2] 
6.1,4.4 Expected! Value of a Quadratic Form 
The quadratic form given by (6 I 27) and (6 I 28) is 

Q = A’'d»A = tr('frAA^) (6 I 47) 

Let A be an [nX 1) random vector but be not a random matrix Using 
the expanded form of a matnx product given by (6 I 3b) and the linearity 
property of the expected value operator indicated by (6 I 44) it can be 
shown that 

£(C)=ti{d>£<AA0) (6 148) 

In general the covariance matrix of A is equal to 

cov(A)=£(AA0-£(A)£{A0 (6 149) 

Using this equation in (6 1 48) and also usmg (6 T 47) yields 

£-(e) = £(A^)4»£(A)+tr{4>cov(A)} (6 150) 

This expression can be used to obtam an estimator for for certain cases 



6.1 INTRODUCTION TO MATRIX NOTATION AND OPERATIONS 


225 


6.1.5 Model in Matrix Terms 

Certain conditions regarding the model and assumptions can be expressed 
more generally using matrix rather than algebraic notation. Let the depen- 
dent variable tj be given by the linear-in-parameters model 

i, = Xj3 (6.1.51) 


where for the same dependent variable measured n times, the dimensions 
of 1 ) are [nXl], and those of X are [nXp]. We term X the sensitivity 
matrix; it is a matrix of derivatives with respect to the parameters. The 
parameter vector ^ is [p X 1]. Thus we have 



Vi 


' x ,, 

^.2 •• 




T } = 

V 2 

, X= 

^ 2 . 

^22 


, 18 = 

P 2 


v„ 



^«2 •• 

1 


1 


Equation 6.1.51 includes all the models given in Chapter 5. For example, 
for Model 5 (t), = + ^ 2 ^ 12 ) ihe X matrix is 


X = 



^.2 


(6.1.53) 


It is possible that the terms could represent different functions of the 
same variable such as Some examples are 


x„ = l, X,2=t., X,2=tf, X„=t^ 

= = X,j = sm6t, 

In other cases X,j might be composed of different coordinates such as time 
t, and position z,, 


'^,1 X ^2 

Similar to the last case rj might represent the output of a chemical process 
which might be a linear function of air flow rate 9 ,, cooling water inlet 
temperature 7), and acid concentration C,; here if there is a constant term 


226 aUPTER 6 \UTR1X ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


we ha%e 

= = ^0='?'.. and 

This same example involving these independent variables is given m 
Section 13 12 of Brownlee (3J, Chapter 6 of Draper and Smith [4] and 
Chapter 5 of Daniel and Wood [5J 

Another situation that can be represented by 1 ) in (6 1 51) is called the 
multiresponse case in the statisucal literature In this case tj and X may be 
given by 


1 

>1(1)] 


>Ii(') 




’li(') 

>1 = 1 


where 



iWJ 


v„0) 


X(l) ' 


>ii(') 


X(2) 


XuO) 

x» 


where X(i)«= 



X(/.)_ 


Kil') 


(6 I 54) 


- f „(0 


(6 1 55) 




Note that the vanable A}*(i) which is called a sensitivity coefficient, is for 
theyth dependent variable in ty, for the A:th parameter, and at the /th time 
Also observe that 




8A ’ 


/=!. 


y-i. 


(6 I 56) 


In the above definitions of and X in (6 t 54) and (6 I 55) there is a total 
of n ‘‘times” and m dependent variables resulting in tj being a [mnxl] 
vector and X having dimensions of (wrtX/iJ Ifm=] then they subscripts 
in (6 1 54)-(6 1 56) are dropped and replaced by 1 or = and 

Multiresponse cases commonly arise m parameter estimation problems 
involving ordinary and partial differential equations An example involving 
xniirmay iiKerentra'i tqcm'owis tAx-Tim tn ilroiMma'i •eiigiTitwmg ■Wnen Vnt 
concentrations of several components present in a chemical reaction are 
measured as a function of time To be more precise, let the concentrations 
of components A and B be designated and and then we 



6.1 INTRODUCTION TO MATRIX NOTATION AND OPERATORS 


227 


would set 

i7i(0==Q(O’ ^2(0 = ^^(0 

Unfortunately in most such cases the model is nonlinear in terms of the 
parameters which are usually rate constants in the reaction problem just 
mentioned. For didactic purposes, however, let C^((,) and C^O,) be given 
by 

C^(0 = ^./.(0 + iS2 / 2(0 


where the^(r,) are known functions. Then the sensitivity coefficients are 


/.(O /2(0 0 

fsO.) 0 /4(0 


An example involving the partial differential equation describing heat 
conduction is heat transfer in a plate of thickness L with temperatures 
measured at several locations as a function of time. Suppose that the 
observations at m positions Xj have been made at times The 

temperatures, which would be the dependent variables, are designated 
T{Xj,t,) and then 

'nj{i)=T{Xj,t,) for y = l,...,w 

For the boundary conditions of a known constant heat flux of at x = 0 
and no heat flow at x = L, the temperature distribution for large times is 
given by 

r(x,,0=To + ;8,/,(/) + ^2/,(y) 

where 

= /32=*“‘ 

The quantity p is density, c is specific heat, and k is thermal conductivity. 
The initial value (i.e., at 1 = 0) of T is Tq, which is known. If there are 
measurements at two locations (such as at x = 0 and x = L), m = 2 and the 
r);(0 values are 



^O+^l/l + PlflU)’ 7= 1,2 


228 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


and the sensitivity coefficients for / = !, ,n are 

Actually this case could also be considered a single response but multiple 
sensors are required for /«> I Many other examples can be generated 

6 1.S.I Identi/iability Condition 

In order to uniquely csiimale all p parameters in the p vector it is necessary 
that 




(6 1 57) 


for estimation problems involving least squares or maximum likelihood 
This criterion is analogous to those given by (5 I 2) for some simple 
models When maximum a posterion esiimatioit is used, it may not be 
necessary that (6 1 5?) be true See Appendix A 

6.15.2 Assumptions 

The standard assumptions given at the end of Section 5 1 are repeated and 
amplified below In making this list all cases considered m this text are 
included but doubtless many other possibilities exist that are not explicitly 
covered below 

1, Y = £'(Y[^) + c = 7;(X P) + t (additive errors in measurements) (6 1 58) 

0 No, measurement errors are not additive 

1 Yes, measurement errors are additive Also the regression function 
•q is correct and does not involve any random variables 

2. £'(e) = 0 (zero mean measurement errors) (6 159) 

0 No, the errors in Y do not have zero means In other words the 
errors are biased 

1 Yes, the errors in Y have zero mean 

3 K(y,|p) = <7^(conslantvananceerrors) (6160) 

0 No, K(y,)P)=^constaDt, thatis I''(yjP)=»o,^ 

1 Yes, the errors r ,» have a common variance 

4 £:{[e,-£(E,)][r^-r(^)])=Ofori’5A7(unconelated errors) (6 1 61) 

0 No, the errors e, are correlated 

1 Yes the errors e, arc uncondated 



6.1 INTRODUCTION TO MATRIX NOTATION AND OPERATIONS 


229 


2 No, the errors are described by an autoregressive process. See 
Section 6.9 

5. £ has a normal distribution (normality) (6.1.62) 

0 No, the distribution is not normal. 

1 Yes, the distribution is normal. 

6. Known statistical parameters describing £ (6.1.63) 

0 No, ip=cov(s) is known only to within an arbitrary multiplicative 
constant, or \j/= where is unknown and S2 is known. 

1 Yes, x}/=cov(e) is completely known. 

2 No, but £ is known to be described by an autoregressive process 
although the associated parameters are unknown. See Section 6.9. 

3 No, \P=cov(e) is completely unknown. 

7. Errorless independent variables [ V (Y^) = 0 for linear models] (6. 1 .64) 

0 No, there are errors in the independent variables. 

1 Yes, there are no errors in the independent variables. 

8. P is a constant parameter vector and there is no prior information 

(nature of parameters) (6. 1 .65) 

0 No, /3 is a random parameter vector and the statistics of /3 are 
unknown. 

1 Yes, /3 is a constant parameter vector and there is no prior 
information. 

2 /3 is a random parameter vector with a known mean of jn and 
known covariance matrix V^. Also p is normal. The random 
vectors P and Y are uncorrelated. All the measurements are consid- 
ered to be from the same “batch.” 

3 j3 is a constant but unknown vector but there is subjective prior 
information. Since j8 is unknown, the prior information regarding P 
is summarized by P being Af(jui^,Vj 3 ) and by cov(/8,., 1^) = 0. 

As indicated in Chapter 5 the eight standard assumptions are denoted 
11111111 . 

The estimation problems are generally less difficult (if the “best” esti- 
mates and their variances are of interest) when the standard assumptions 
are valid. If, for example, the covariance matrix xp of the errors is com- 
pletely unknown and there are no repeated observations, the complete xp 
matrix cannot be estimated along with the parameters. From repeated 
observations, investigations of the measuring devices, and familiarity with 
estimation in such cases, the xp matrix gradually becomes better known, 
however. Then estimators other than OLS can be used although a nondiag- 
onal xp does introduce complications; see Sections 5.13 and 6.9. If xp is 
completely unknown in a multiresponse case, Box and Draper [6] recom- 
mend the use of a different method than OLS; see (6.1.73a). 



230 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


616 Maximum Likelihood Sum of Squares Functions 

Suppose that the measurement errors are additive, and have lero mean 
normal distribution known statistical parameters and errorless indepen 
dent variables Also there is no prior information This case is desig 
nated 1 1 nil 

The probability density function for the above assumptions is 

/(y|/5).(2r)-*''=W-"/"ap[-(V-,)^*-'(y-,)/2] (6 166) 

where if “ ‘ is the inverse of ip given by (6 1 38) and N is the total number of 
observations Before the data are available /(Y) P) the probability density 
of Y given p associates a probability density with each different outcome 
Y of the experiment for a fixed parameter vector p After the data are 
available we wish to find the various values in P which might have led to 
the set of measurement V actually obtained In the maximum likelihood 
method the likelihood fmcnoH Z.(P|Y) is maximized L(p\Y) has the same 
form as /(Yl 0) but now p is cotvsidetcd vanable and Y is fixed 
The procedure for forming estimators from HP\Y) usually first in 
volves taking the natural logarithm of (6 1 66) to get 

)(P|Y)-ln/,(P|V). - t [Mln(2,) + lnWi + 5„J (6 1 67a) 

where 

S„,=(Y->1)V '(Y-<1) (6167b) 

When V' ts known as assumed above maximizing LOlY) is equivalent to 
minimizing Several possible algebraic forms of given next 

6.1 6.1 Single Dependent Vanable (Single Response} Case 

Let ij represent a single dependent variable that is measured N = n 
different times or conditions For correlated errors yp ' is not diagonal 
Let the inverse of >f be given by the symmetric matrix W with components 
tPy For this case is given in algebraic form by 

■^ML^ i i (.y-v)(Y,-n,)fy, (6 I 68a) 

»-l y=-l 

If >p IS diagonal with components then 

^ML- ' (6l6Sb) 

For further discussion of 5 ml single response cases see Section 6 5 



6.1 INTRODUCTION TO MATRIX NOTATION AND OPERATIONS 


231 


6. 1.6.2 Several Dependent Variables (Multiresponse) Case 

Suppose that m different dependent variables are measured at n different 
discrete instants. For example, temperature and concentration might be 
recorded with time. The formulation below is also appropriate if the same 
physical quantity, such as temperature, is dynamically measured utilizing 
several thermocouples each positioned at a different location. 

In multiresponse cases, the notation given by (6.1.54) is used for ij and a 
similar expression is used for the mn observation colunrn vector, Y. Let the 
inverse of be W which is partitioned into a symmetric nXn matrix with 
components being mXm matrices, 

'W(l,l) W(l,2) W(l,n) ' 

W= : : ; (6.1.69) 

W(l,«) W(2 ,m) ••• W(«,/7) 

Then the maximum likelihood sum of squares function is 

i 2 [Y(0-»j(0]''w(/j)[Y(y)-n(y)] ( 6 . 1 . 70 ) 

/=! y=i 

which can be simplified further for independent errors. For example, let 
the errors be zero mean and independent in “time,” that is, £'[e(/)c(_/)^] = 
0 for i ^ j. Then (6. 1 .70) reduces to 

^ML= i [Y(/)-7 ,(/)]V(/,/)[Y(/)-t,(/)] (6.1.71) 

/=1 

If, in addition, the Ej{i) values are independent for a given i, that is, 
•^[® 7 ( 0 £a-( 0]='0 for j=i^k, further simplifies to 

m n 

‘^ML= 2 2 [J).(/)-^,(0]V'(0 (6-I.72) 

1=1 

since Wjj{r)=^ajr\i). 

In this section, is given with j}/ and its inverse assumed known. 
Exactly the same estimates for the physical parameters (/3) are obtained if 
if'^is known only to within a multiplicative constant, that is, where 

o is unknown but 0 is known. This case is designated 11—1011. Should 
t e matrix contain more unknown parameters than a^, the ML proce- 
dure becomes more complicated. See Section 6,9.5. 



»2 aUPTER 6 MATRIX ANALYSIS FOR UNEAR PARAMETER ESTIMATION 


When the ^ matnx is completely unknown. Box and Draper [61 recom 
mend for the mulliresponse case that the following determinant be mini- 
mized, 



(6 1 73a) 


where is given by 

(6 173b) 

For »i= 1, Ibis reduces to ordmaiy least squares For m>2 the minimiza 
tion of (6 I 73a) must be accomplished by the solution of a set of twnlmear 
equations even when the model is linear in the parameters For other 
discussion of mulliresponse cases, see Sections 675 and 7 8 and the 
Hunter paper I?) 


6 1.7 Gauss-Markov Theorem 

An important result in estimation is the Causs-Markov theorem if 
IS a linear regression function, if errors are additive and have zero mean if 
the covanance matnx of e if', is known to within a multiplicative constant 
(if-^na^) and is positive definite, if there is no error m the independent 
variable, and if the estimation procedure does not use prior information 
(assumptions 1 1- 01 1), then of all unbiased estimators of any component 
of p which IS unbiased and which is a linear combination of the 
observations, the component t>,piy of b^v has minimuni variance where 

l>Mv*(X^n''X) Vn-'Y (6 174) 

The covanance matnx of bj^y is 

covO»Mv)=(X^«"'X) V (6 175) 

Since we are usually more interested in the use of our estimates of p in 
of Y os odiAC twAit OMv.h'.vAt'/axvs, 'iC Oja 
components of /5, we shall prove a stronger theorem which includes the 
one just stated, namely under the conditions stated above, of all unbiased 
estimators of any particular linear combination of the components of p. 



6.1 INTRODUCTION TO MATRIX NOTATION AND OPERATIONS 


233 


fp, say, which can be expressed as a linear combination of the observa- 
tions, the one with minimum variance is E^b^y- 
To start the proof, we note that b^v is simply expressible in terms of /3 
and s 

b^y = (X^S2- 'X) “ ’X^0~ ‘(X/3 + e) = 13 + Ae (6.1.76) 

where 

A=(X^fi“'X)"Vn-' (6.1.77) 

so that 

£(bMv) = /3 + A£(e) = /3 (6.1.78) 

or in words, b^v is an unbiased estimator of j3. The covariance matrix of 
bMv is 

cov(b^^) = £[(Ae)(A£)^] = £:[A«c^A^] = AfiA^a^ = (X^J2-'X)" 

(6.1.79) 

Consider now the scalar E^b^^y. It follows from (6.1.78) and (6.1.79) that 

= (6.1.80) 

and 


cov(E^bMy) = E^(X^n-'X) 'Eo^ (6.1.81) 

Next we consider any unbiased linear estimator of E^j3. It can be written 
as 


E^b^y-hC^Y 


(6.1.82) 


where C is a vector of constants. 

Since E^b^v + C^Y is unbiased and since 

£(E^bMy + C^Y) = j3-t-C"'Xp (6.1.83) 

we see that C^X^ is the null vector. Since, by definition of unbiased 
estimator, this condition cannot depend on the value of /3, C^X must be a 



iJ4 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMCTER ESTIMATION 


null vector. The vanance of is 

y + C^Y) - £(P(X’'B-'X)"‘x’’S-'i+ e'tf 

+ E(,CVC)+2E{c’'te’'Sl-‘X(X’'n-'X)~'t) 

-('■(X''Ii-'X)'’lo’+C''BCo'+2CrX(X''0-'X)''lo’ (6 1 84) 

Since C^X«0, the third term is 0 The first term is unaffected by a choice 
of C Since 0 IS positive definite C^llC is positive unless C is the siuU 
vector Thus no unbiased linear estimator of has a vanance less than 
the vanance of f^b,^ Hence f^b^y is a ntinimum variance unbiased 
linear estimator of r/J and in particular is 21 minimum vanance 
unbiased linear estimator of 

62 LEAST SQUARES ESTIMATION 

In order to obtain parameter estimates using ordinary least squares (OLS) 
none of the standard assumptions need be valid Instead a prescribed sum 
of squares is always minimized When nothing is known regarding the 
measurement errors, OLS is recommended The reader should be made 
aware however, that he may know more about the measurement errors 
than he first thinks After analyzing similar sets of data for example, he 
can learn much regarding the errors 
When information regarding the measurement errors is present and ef ts 
far from constant, some estimation procedure other than OLS is recom 
mended 


d.2.1 Ordinary Least Squares Estimator (OLS) 

The sum of squares function used for ordinary least squares with the linear 
model is 

■Sts=(V-Xp)''(Y-xp) (62)) 

which is a quadratic form The pimciple of least squares asserts that a set 
of eslimates of paramelere can be obtained by nunimumg Sls This is 
accomplished by setting the matnit denvatives of S^s with respect to ^ 



6.2 LEAST SQUARES ESTIMATION 


235 


equal to zero. Using (6.1.30) and (6.1.26b) we get 

V^5j,s = 2[V^(Y-X^)^][Y-X^] (6.2.2) 

V^(Y-X/3)"'=-V^i3^X^=-X^ (6.2.3) 

and thus (6.2.2) equated to zero at ^ = b^s produces 

-X^Y+X^XbLs = 0 (6.2.4) 


Premultiplying by the inverse (X^X) ' results in the ordinary least squares 
estimator, 

(6.2.5) 

This estimator requires for unique estimation of all the p parameters that 
the/) matrix X^X be nonsingular or [X^Xj =5^0. This means that any one 
column in X cannot be proportional to any other column or any linear 
combination of other columns because if such a proportionality (i.e., linear 
dependence) exists, |X^X| = 0. (See Appendix A at the end of the text.) The 
condition jX^Xj^^O also requires that n, the number of measurements of 
y,, be ejqual to or greater than the number of parameters p. If the predicted 
curve y; is not to pass through each observation it is further necessary that 
n>p + \. 

Estimators obtained from (6.2.5) for Model 2, -q, = ySA,, and Model 5, 
+ those obtained in Chapter 5; in particular see Table 

5.3. For the single response case and using X given by (6.1.52), X^X and 
X^Y are given by 

2 ... 

symmetric 


2 «^ 

( 6 . 2 . 6 ) 




'2x„y, 

x^Y= ; 

.22r„y, 

where the summations are from 1 to n 


(6.2.7) 




236 aiAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


There are many possible applications of OLS One is to find a curve 
which IS “best” fit to some actual data Another is to find parameters 
having physical significance One can also use OLS to pass an approximate 
curve though some points that have been analytically derived 

Example 6.2.1 

Give the components of the and X^V matrices for the model 


Solution 


In this example X is 



n Zf, 

2/* 


'zy, 

X'X- 

2r* 

Sr’ 

X^Y-' 

2f,y, 


symtnetnc 

2/* 

1 

2i*y, 


Example 6 2.2 

Many expcnmentally based heat transfer correlations are given by the Nusselt 
number (Nu) as a function of the Reynolds (Re) and Prandtl (Pr) numbers One 
such correlation is for flow normal to healed cylinders and wires Even though Nu 
may vary several orders of magnitude, typically the variation in the measurements 
for Nu IS ^30% irrespective of the magnitude of Nu On a log-log plot of Nu 
versus Re for the geometry mentioned, the data appear to be described by a second 
degree curve and the vanance of fc^Nu ts nearly constant For the below values 
given by Welly [8, p 268J. use OLS to find the parameters in the linear in the 
parameters model 

logNu-/S,+/J2logRe+ft(togRe)*+t 
Also give Nu as a function of Re and compare with the below given values 
Re 01 1 10 10* 10’ 10' KF 

Nu 


045 084 1 83 5 1 157 565 245 



LEAST SQUARES ESTIMATION 


237 


Solution 

The model can be written m the more familiar form 


= + + y=i) + e 


where y=logNu, v = E(Y), and / = logRe. In terms of Y and (, the data are 


1 1 
^ -1 

Y, -0.347 


2 3 4 

0 1 2 

-0.076 0.263 0.708 


5 

3 

1.20 


6 7 

4 5 

1.75 2.39 


which IS equally spaced in /. The matrix is given by 


X^= 


11111 1 1 
1 0 1 2 3 4 5 

1 0 1 4 9 16 25 


Note that the rows in X^ (or columns of X) are not proportional or even nearly so. 
This indicates that little difficulty will be encountered due to X^X being nearly 
singular. Using the results of Example 6 2.1, the X^X and X^V matrices are found 
to be 


X^X = 

■ 7 
14 

14 

56 

56' 

224 

, X^Y = 

■ 5.885 ■ 

24 57 


56 

224 

980 


101 3 


The inverse of the symmetric X^X matrix is the symmetric matrix [see (6.1.11)], 

4704 -1176 O' 

-1176 3724 -784 

0 -784 196 

Then from bLs = (X^X)~‘X^Y we find the estimated parameters to be 
b[s = [ -0.0734 0.314 0.0358] 

In order to express the model in terms of Nu, let /3, = log5 which produces 
5 = 0.844. With B m the logNu expression we can write 

Nu = B Re*2 *3 >oS Re _ Q 344 3 14 + 0 0358 log Re 

Values obtained using this expression are given in the third column of Table 6 1. A 
comparison of the recommended and Nu values shows an agreement within ±3%. 
Note that the largest residual (Nu — Nu) occurs at Re =10^ and is 0.9, which is 
much larger than those for small Re. If an OLS analysis on Nu (and not logNu) 
a ^ 2 n used, the magnitude of the residuals would have been more uniform over 
t e u range, causing the relative differences in columns two and three of Table 
to e ™uch larger for the small Nu values than the large values. Hence in this 
case m w ic the relative differences in the given and predicted values of Nu were 
near y constant, OLS estimation on logNu produced the desired result 


(X^X) = 


1 


16464 



238 CHAPTER 6 MATRIX ANALYSIS FOR UNEAR PARAMETER ESTINUTIOV 


whereas OLS on Nu would not Furthcnnore the transformation from Nu and Re 
10 logNu and logRe, respectively, results m a linear estimation problem while the 
problem m terms of Nu and Re is nonlinear TTicse data are considered further m 
Example 6 3 1 


Table 6 I Given and OLS Values for 


Nu, for Example 6.2.2 


Re 

Nu. 

Given 

Values 

OLS 

Values 

0 1 

0 45 

04452 

1 

0 84 

08445 

10 

1 83 

I 889 

100 

5 1 

4 983 

10^ 

IS 7 

1550 

10" 

56 5 

5685 

10* 

245 

245 9 


6.2.2 Mean of the OLS Estimator 

In order to obtain the parameter vector but It IS not necessary to invoke 
any of the standard assumptions However, if some scatisticai statements 
are to be made regarding bjj, we must know several facts regarding the 
errors Let there be additive, zero mean errors m Y and let X and p be 
nonstochastic <l 1 — 1 1) Then the expected value of b^s is 

E (b^s) = £■[ (X^X) 'X^y ] = (X^’X) ■ 'X^£ ( Y) 

= (X’’X)' Vxp*p (628) 

With the four assumptions given above, the least squares estimator is thus 
shown to be unbiased 


6.23 Variance-Covariance Matrix of 

WyJj. iJ?/t 'Ltd of thft four assumptions denoted f,ll — ll) and. lUjJazing, 
(6 2 8) we can find the covariance of Let 

bLs=A(Xp + f) = /J+Ae whereA=(X^X)''x^ 


(6 2 9) 



6.2 LEAST SQUARES ESTIMATION 


239 


since AX = L Then cov(bLs) is 

cov(b^s) = ^[(bLs-P)(hs-Pf] 

= £[(/3 + Ae-|3)(|3 + Ae-/3)"'] = A4'A^ 
since \p = E{ee^). Utilizing the definition of A then gives 

cov(bLs) = (X"'X) " 'X VX(X^X) ~ = Pls 


( 6 . 2 . 10 ) 


( 6 . 2 . 11 ) 


in which we also introduce the symbol P^s- 

Without the additional standard assumptions of uncorrelated and con- 
stant variance measurement errors (1111—11), the OLS estimator does not 
provide the minimum variance estimator. For this reason, bLs is said to be 
not efficient. Suppose that the standard assumptions of 1111—11 are valid, 
then ^ is given by 

= E{ee^) = aH 

and we have 

cov(bi,s) = (X^X)''a^ 


which is the minimum covariance matrix of b^s and thus for these assump- 
tions, which include additive, zero mean, uncorrelated, and constant vari- 
ance errors, bLs does provide an efficient estimator. (See Section 6.1.7 on 
the Gauss-Markov theorem.) 

In addition to needing the covariance of the parameters, one may desire 
the covariance matrix of a collection of predicted points on the regression 
line or surface. Using (6.1.46) with the assumptions of additive, zero mean 
errors and nonstochastic X and P (11— -11) we find for the set of points 
represented by Y = X,bLs, 

cov(Y) = cov(X,bLs) = X, cov(bLs)X[ 

=X,(X^X)“'x^^X(X^X)-'xr (6.2.12a) 

See Fig. 5.7. For independent, constant variance errors this expression 
reduces to 


cov(Y) = X,(X^X) 'X[a2 (6.2.l2b) 

The diagonal elements of this square matrix give the variances of F,. For a 




240 aUPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


typical term of (6 2 12b) see (5 241a}, which is for the simple model 

l/=^0 + ;S,.Y 


6,2,4 Relations InvoMng (he Sum of Squares of Residuals 

For ordinary least squares with the linear model rj = X/) the minimum sum 
of squares is 

RLS-(V-Xb„)''(V-Xb„) (6213) 

which IS the sum of squares of residuals (It is also called error sum of 
squares and designated SSE) By expanding employing the 

expression given by (6 2 5) we can also write 

KLS=Y^V-bJsX'‘YaY’‘Y-b[sX^XbLs (62 14a b) 

since (6 2 4) and the scalar nature of can be used to obtain 

Y'‘Xbj.s«b[sX'‘Y-bj;jX'’XbLs (62 15) 

The expression given by (6 2 14a) can be convenieni to use for numerically 
evaluating because the individual residuals need not be calculated and 
also because X^Y is known from the b^j calculation as indicated by (6 2 5} 
and (6 2 7) 

For the standard assumptions of additive, zero mean, constant variance 
uncorrelated errors errorless X matrix, and nonrandom parameters 
(llll -11) the expected value of Ri_^ given by (62 13) is obtained next 
Substituting (6 2 5) into (6 2 13) gives 

Rls = (y - x(x^x) 'x’'y)'^(y - X(X^X)' ’x^'y) 

=»Y^(l-X(X^X) 'X^) (l-X(X^X)' V)Y 

= Y’‘(l-X(X^X) ‘x^Y (62 16) 

since I — X(X^X)"'X^ is symmetne and idempotent Let us substitute 
XP + t for Yin (6 2 16) to find 

R^=e^(l-X(X^X)"'x^)e (62 17) 

since X^(I-X(X^X)-'X’’)=X^-X^=0 Note that R^s given by (6 2 17) is 



6.2 LEAST SQUARES ESTIMATION 


241 


a quadratic form in terms of e. Then using (6.1.50) with £(£) = 0 and 
cov(e) = ff^I, the expected value of i?Ls ** 

£ ( i?Ls ) = tr [ I - X(X^X) " ’X^] (7^ 


= tr(I)o2-tr[(X^X) ‘(X^X)]a2 = (n-p)a2 (6.2.18) 


It follows that an unbiased estimate of (provided the assumptions 
nil-- 11 are valid) is 



n—p 


(6.2.19) 


This is a generalization of the results in Chapter 5 where p = 1 and 2. 

A summary of the basic equations for OLS estimation is given in 
Appendix B. 


6.2.5 Distribution of /?ls and b^s 

If in addition to the assumptions used above, the errors e,, i=\,...,n have 
a normal probability density, several theorems can be stated regarding i?LS 
and bj_^. They will be presented without proof. 

Theorem 6.2.1 

The quantity bas the K^{n—p) distribution. 

Theorem 6.2.2 

The vector bj^g— /3 is distributed as N(0,a^(X^X)“ ’). 

Theorem 6.2.3 

Let be the /th diagonal element of (X^X)“’i^. Then the quantity (^,,ls“ A)/®i 
has the t{n—p) distribution. 

Another theorem is concerned with testing whether certain subsets of the 
parameters are needed in the model. Let the parameter vector j3 be 
partitioned so that J 82 ^] and let X be partitioned conformably so 

that X = [Xj Xj]. Then Y can be written as 


Y = X,/3,H-X2/32-f£ (6.2.20) 

where contains p - ^ elements, say, and contains q elements. Assume 
that the hypothesis to be tested, simultaneously specifies all the compo- 



242 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


nents of Pj One possible bypothesis is that Pj= where P| could be a 
zero vector The test is based upon the relative reduction in the residual 
sum of squares when all p parameters ate estimated as compared with the 
residual sum of squares when p, u estimated with ^ 2 “^: reduction 

IS large, the hypothesis that P 2 =*P 2 » untenable 
The least squares estimate for all the parameters is the usual expression 

(i|ii-[‘^“]-(X''X)‘'x''Y (6221) 

which has the associated residual sum of squares corresponding to (6 2 17) 
of 


Jl(b^s)- S(h, is.b, ,s)>.«''(l-X(X'^X)“'x''), (6 2 22) 

Now suppose that P 2 were set equal to P 2 and estimates of p[, denoted by 
bj i 4 (P 2 *) are obtained 

bi Ls( Pi) - (XrXi) ■ 'xr(V - Pi ) (6 2 23a) 

which produces the R expression. 

K(», .s(/);),«)'=«'(l-X,(xrx,)"xr). (6 2 23b) 

when P 2 ®’P* Now we give the theorem 

Theorem 6.2.4 
The staiistic 

^ {^[hiLstPi ) Pi IS bi,Ls)}/? iR/g 6224 ) 

^ R(biisb2tsy/("-p) LS h Ls)/fn-p) 

has the Fi n—p) discnbution Sec Sectioa 2 8 10 for an F table Also see 
Section 5 7 for other discussion 

For a proof, see Goldfeld and Quandt [9 p 46] 

In using Theorem 624, the hypothesis that Pj = p 2 tested at the a 
level of significance by comparing F of Theorem 6 2 4 with the cntical 
value Fi^a(^,n—p) If this critical value is exceeded, the hypothesis that 
Pj = Pi IS rejected Note that the particular null vector of PJ “0 can be 



6.2 LEAST SQUARES ESTIMATION 


243 


used to investigate whether the model should include the X 2 jS 2 terms; in 
this case if the ratio in Theorem 6.2.4 is greater than F^_^{q,n—p), then 
we have an indication that the X 2 P 2 terms may be needed. 

Example 6.2.3 

A solid copper billet 1.82 in. (0.0462 m) long and 1 in. (0.0254 m) in diameter was 
heated in a furnace and then removed. Two thermocouples were attached to the 
billet. The temperatures, F,, given by one thermocouple are given in Table 6.2 as a 
function of time. See also the plot of V, versus time shown in Fig. 6.1. The other 
thermocouple gave values about 0.2 F° (0.11 K) larger for smaller times and 
decreased gradually to about 0.14 F° (0.08 K) larger at 1536 seconds. For a 
constant temperature environment the thermocouple temperature readings have an 
estimated standard deviation of 0.03 F° (0.017 K). 


Table 6.2 Data for Example 6.2,3 


Observation 

No. 

Time 

(sec) 

Temperature 
of Billet (°F) 

1 

0 

279.59 

2 

96 

264.87 

3 

192 

251.53 

4 

288 

239.30 

5 

384 

228.18 

6 

480 

217.24 

7 

576 

207.86 

8 

672 

199.36 

9 

768 

191.65 

10 

864 

184.44 

11 

960 

177.64 

12 

1056 

171.41 

13 

1152 

165.04 

14 

1248 

159.89 

15 

1344 

155.19 

16 

1440 

150.78 

17 

1536 

146.68 


Find a satisfactory power series approximation to the temperature his- 
tory of the billet given in Table 6.2. Assume that all the standard assump- 
tions except known are valid. Use the F test at the 5% level of 
significance. 



244 CHAPTER 6 \UTRDk ANALYSIS FOR LINEAR PARAMETER ESTIMATION 



Solution 

The suggested model for the temperature m this case has the form 

y = i8,+^j/+ftt*+ ' + * 

If the actual values of t as given in Table 6 2 are used (he components of the X^X 
matrix would have disparate values For example the first diagonal term is 17 and 
the fifth IS 2 07X lo'* For this reason the model is written as 

+x.{e)' '+■ (<■) 

where At is 96 sec 

There are many ways of determining the best model One way for this model is 




6.2 LEAST SQUARES ESTIMATION 


245 


to calculate the parameters and residual sum of squares, R, for p and p - 1 

parameters for p=2,3,4, Then an F test based on Theorem 6.2.4 is used. In 

each case we have q= 1 and the hypothesis is /i* = 0. The sum of squares, 2 ?ls> ^or 
p=l to 6 are listed in Table 6.3. The mean square (i?Ls divided by the number of 
degrees of freedom) is also given. The F ratio is the Af? value in the table, 
R(b^,...,bp)— R(b^,...,bp_^), divided by the mean square, s^. This ratio is com- 
pared with TpsCl, 17 - p), which is 4.75 and 4.84 for p = 5 and 6, respectively. Then 
at the 5% level of significance, the p = 5 model is selected. (This decision is based 
on the standard assumptions, 1 1 1 1 101 1, being valid.) The parameters correspond- 
ing to equation a in English units are 279.660, —15.451, 0.71350, —0.023024, and 
0.00039479 for b^,...,bs, respectively. 


Table 6.3 Sum of Squares and F ratio for Example 6.2.3 


p 

Degrees of 
Freedom (d.f.) 

«(^"ls). 

Residual 

Sum of 
Squares 

Mean 

Square, 

(^^=R/d.f.) 

AR 

F= 

^R/s^ 

1 

16 

27624.667 




2 

15 

894.491 

59.633 

26730.176 

448.3 

3 

14 

16.1613 

1.1544 

878.33 

760.9 

4 

13 

1.0959 

0.0843 

15.0654 

178.8 

5 

12 

0.7186 

0.0599 

0.3773 

6.30 

6 

11 

0.5781 

0.0526 

0.1405 

2.67 


The residuals for the p = 5 model are shown in upper portion of Fig. 6.1. They 
seem to be somewhat correlated and are even more so for smaller time steps. (See 
Fig. 6.6.) The assumption of uncorrelated measurements may not be valid. Hence 
the selection of a five-parameter model based on the uncorrelated errors assump- 
tion may not be correct. The p=4 model (cubic in t) might be actually more 
appropriate. 

The inverse of the X^X matrix is 


(x^x)-' 


0.7853 

-0.5581 

0.1174 

-0.00945 

2.58X10-* 


0.6697 

-0.1709 

0.00153 

-4.43X10-'* 



0.04739 

-0.00044 

1.34X10-'* 




4.305X10-* 

-1.32X10-5 


symmetric 4.13X10-'^ 


If the assumptions 11111011 are valid, the estimated covariance matrix of bLs is 
^ X) , given above, multiplied by which is 0.0599 (the mean square value in 
Table 6.3 forp = 5). From these values we can find, for example, that the estimated 
standard error of b, is [0.7853 (0.0599)]*/2 = 0.217 and that of b, is 0.000147. A 



246 aiAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


comparison of these values with and 65 shows that the relative uncertainty m 6 , 
IS much greater than for ij 

The value of J = (0 0599)'/^=0245 F" ts worthy of comment First, /or;> = 6 , 7. 
and 8 , its value would not decrease greatly Next, (his value of 0 245 is considerably 
larger than 0 03 found for a constant, low temperature environment It is, however, 
close to the difference in temperature between die two thermocouples At the 
higher temperatures the specimen's temperature may be more sensitive to an 
currents, etc Also the calibration error over the whole temperature range is greater 
than 003 For these reasons additive errors with as large a standard error as 
0 24 are quite possible 

Example 6.2.4 

Using the model for temperature found in (he preceding example, estimate h in the 
differential equation, 

KV^-hA{T,-T) 

This equation describes the temperature T of the billet assuming that there ts 
negligible internal resistance to heat flow The density p, of copper is 555 Ib„/ft^ 
(8890 kg/m’) c ts the specific heat and has the value of 0 092 Btu/tb„-*F (0 385 
kJ/kg K), V IS volume A ts heated surface area, h is the heat transfer coefficient, 
and IS the air temperature 

Solution 

Solving Che above equation for h gives 

pey dY 

” A(T^- r) 

where the true temperature T has been replaced by the estimated value Y The 
ratio y/A IS found to be 001634 ft (000498 m) The temperature Y and the 
denvative is obtained from the power senes model with p * 5 The denvatives are 
calculated using 

dt A( 

where the parameters are given in the preceding example and Ai = 96 sec or 0 02667 
hr The temperature is the room temperature which is nearly constant at 81 5 
»F (300 6 K) 

The h, estimates are depicted in Fig 62 From knowledge of the heat transfer 
process we know that the magnitude of the A values are reasonable and that they 
should drop with the temperature difference, T— T„ 



Btu/hr-ft 


6.2 LEAST SQUARES ESTIMATION 


247 



Time step index, i 

Figvire 6.2 Heat transfer coefficient as a function of time for Example 6.2.4. {i = t,/M). 

One advantage of this method of analysis for this problem is that it is not 
necessary to specify a functional form for h, versus time. Another is that the 
estimation for h, is a linear problem. Some disadvantages are that (1) a functional 
form for T is needed; (2) sometimes this method yields h, values that are more 
vanable than are expected; and (3) the method is not “parsimonious” in its use of 
parameters. The method that is usually recommended tor this problem involves the 
solution of the differential equation given above. This would give T as a nonlinear 
function of time and h and thus nonlinear estimation would be needed. See Section 
7.5.2. 

6.2.6 Weighted Least Squares (WLS) 

There are cases when the covanance matrix of the errors is not known 
but yet the experimenter wishes to include some general information 
regarding the errors. He might know, for example, that as Y, increases in 
magnitude the variance of Y, also increases. This was the case discussed 
with the heat transfer example in Example 6.2.2 m which o,/ Y, was about 
constant. It might be appropriate to assume that some symmetric weight- 
ing matrix w is given. This matrix may or may not be equal to the 
inverse of the error covariance matrix. 




248 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


In this estimation problem, the function to minimize for the linear model 


of ij =Xp IS 


•Swi^=(Y-Xp)^«(Y-XfJ) 

(62 25) 

which yields the estimator 


bwLs=(X^«X)''x^«V 

(62 26) 


Using the standard assumptions of additive, zero mean errors and errorless 
X and p (11 — II), the covariance matnx of b^^Ls be shown using 
(6 2 10) to be 


cov(b,^j.s)‘=(X^«X) ’X^«.fMX(X^wX) ' (62 27) 

6J ORTHOGONAL POLYNOMIALS IN 015 ESTIMATION 

The polynomial regression model with evenly spaced (m /) observations 
can be analyzed wuh ceriam computational simplifications thiough the use 
of orthogonal polynomials There are many different sets of orthogonal 
polynomials that could be employed Some are appropriate for use with 
uniform spacing m the variable / Others have been developed for other 
particular spacings and for weighted least squares, Himmelblau [10, p 
152] A set of orthogonal polynomials is given below for OLS and uniform 
spacing 

Suppose that the model is given by the polynomial which is a function of 
a single independent variable, 

= +/),r; + e, (631) 

and that Ti has been measured at n values of t, which are evenly spaced or 
= + One can rewrite this model as 

y, = a‘QPo(r,) + a^p,(l,)■^■alP^{t,)+ +«,/?,(/,) + £, (63 2) 

wheiep,(j) is a polynomial in t of orderj Thep^(l) functions are chosen so 
that they are orthogonal, that is, for j and A:«0, 1. ,r, 

i-i 


iorj^k 


(6 3 3) 



6.3 ORTHOGONAL POLYNOMIALS IN OLS ESTIMATION 


249 


For these orthogonal polynomials, OLS yields 


2 YiPmiQ 

I 




(6.3.4) 


The first few orthogonal polynomials satisfying the above conditions are 


PoiO=h 





n+ 1 
2 




n^-\ 
12 ’ 



(6.3.5) 


where t = ^tjn. Notice that the Pj{t) values are dimensionless and are 
functions only of n (for given j and / values). A recursive scheme for 
obtaining p,+i(/,) in terms of is 

1 (0 =F,(Of. (0 - F,- , (0> 


y= 1,2,...,/-!; /=1,2,...,« 


(6.3.6) 


The starting conditions on j are Po(/|)= ^ ^tid /?,(/, ) = /-(/? + l)/2. 

If desired, after the parameters a(,,a,,...,a^ have been estimated, one can 
find the Pq,...,P^ from the d, values. For example, for r = 0, 1, and 2 we 
have [since (6.3.1) and (6.3.2) are identical polynomials in /] 


r = 0; 
r=l: 

r = 2; 


/^O ~ ®0’ 

t * “i 

/Io=do-«i^, ^1=-^ 



n^-\ 

12 





A/’ 



(6.3.7) 


Example 6.3.1 

Using the evenly spaced in log Re data of Example 6.2.2, estimate ao,...,a,. and 

compare the results with those in Example 6.2.2. Let /■ = 3 which causes t? to" be a 
cubic in t. 



250 CHAPTER 6 NUTHIX ANALYSIS FOR UNEAR PARAMETER ESHMATION 


Solution 

Using the orthogonal polynomials given (63 5). we can find for the gisen data 
(the i, /„ Yi table of Example 622) that Ar^l, and X is given by 


1-3 5-6 

1-206 
1-1-3 6 

X- 1 0-4 0 

1 1-3-6 

12 0-6 
13 5 6 

Note that X is composed of orthogonal vectors which results m X'^X reducing to 
the diagonal matrix. 


0 28 

0 0 


which has the simple inverse 

(X'’X)“-diag(7 ' 


These are the X, X^'X. and (X*'X)"' matrices for r^2 and for any set of seven 
equally spaced values of i The X^Y vector becomes 



5 8846 
12 7968 
3 0066 
-0 1516 


Then from either d=«(X^X) 'X^Y or from (634) we find 


084066 

045703 

003579 

-000070 

Since the polynomials pj(t) are orthogonal, the d/s are independent That is, 
regardless of the value of r in (6 32) wc find dp=0 84 
The residual sum of squares as r increases can be easily calculated from the 
above information The total sum of squares Y^Y, is given by 

Y '’Y * r Y;* = 1 0 90349564 


The residual sum of squares can be found usir^ 



63 ORTHOGONAL POLYNOMIALS IN OLS ESTIMATION 
The models for /■ = 0, 1, and 2 in terms of are 


251 


r=0: y,= 0.84066 

r=l: 7, = 0.84066 + 0.45703/) , (/,) 

r = 2 : y, = 0.84066 + 0.45703/) ,(/,) + 0.03579/)2 ( 6) 

where we note that each parameter a, is unchanged as r is increased. In the 
standard form given by (6.3.1) this is not true; using (6.3.7) yields 


r^O: y, =0.84066 

r = 1 : y, = - 0.07340 + 0.45703 (, 

r = 2; y,= -0.07340 + 0.31386/. + 0.03579ff 

where the parameter estimates change in most cases as r is increased. 


In the above example the X matrix for seven equally spaced observa- 
tions contained only whole numbers. If another column were added, 
fractions would enter. To avoid fractions, tables have been prepared 
[11,12] which contain factors \ and elements For the /i = 7 case above, 
a table frequently has the information as given in Table 6.4. The compo- 
nent Xy of X is related to by 




j d, l,'**,t*7 i 1,2,...,/? 


(6.3.8) 


Table 6.4 Orthogonal Polynomials for /? = 7 


i 

'I’lO 


<P,2 

^i3 

'I*, 4 

1 

1 

-3 

5 

-1 

3 

2 

1 

-2 

0 

1 

-7 

3 

1 

-1 

-3 

1 

1 

4 

1 

0 

-4 

0 

6 

5 

I 

1 

-3 

-1 

1 

6 

1 

2 

0 

-1 

-7 

7 

I 

3 

5 

1 

3 


7 

28 

84 

6 

154 

\ 

1 

1 

1 

1 

7 

J 

6 

12 



252 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


and Dj is related to the ith diagonal lerm of X^'X by 

(6 39) 

l-I ^ 

Using these relations in (6 3 4) gives 

= | 2 m = 0,\, , Xo=l (63 10) 

Some of ihe advantages of the use of orthogonal polynomials are the 
following First, no difficulty can occur in finding the inverse of the X^X 
matrix because it is a diagonal matnx with nonzero diagonal components 
If r IS increased by one, (X^X) ' is changed merely by the addition of a 
diagonal element and the sum of squares is reduced by 

Second for a number of cases the terms of X^X are tabulated thereby 
reducing the number of calculations Third, the parameters d(|,d|, are 
unchanged as additional values are estimated Tlius one can easily add 
additional parameters by increasing the degree of the orthogonal poly- 
nonttals Fourth, when orthogonal polynomials are used there is usually 
less accumulation of rounding errors in the calculations of the estimates of 
Ihe parameters A disadvantage of orthogonal polynomials is that they are 
not convenient for sequential estimation, one cannot easily add another 
observation See Section 67 A further disadvantage is that polynomials 
orthogonal or not arc limited in iheir ability to fit functions There is a 
tendency to think that orthogonal polynomials can be used to fit any single 
independent variable data provided the degree is high enough The model 
might be impractical, however, if the degree is higher than the fourth or 
fifth, say The intervals between the data points could have predicted Y 
values that are quite oscillatory For curves that seem to require even 
higher degrees, splines are recommended (I3J 

6.4 FACTORIAL EXPERIMENTS 

64.1 Introduction 

In Chapter 5 we found that the best design for estimating of Model 3 is 
one in which one half of the observations are taken at the smallest 



6.4 FACTORIAL EXPERIMENTS 


253 


permissible value of A",- and the other half at the largest. This is a 
recommended procedure when the model is known to be linear in X, and 
when the standard assumptions indicated by 1111-11 were valid. In other 
cases there may be several independent variables a,,-, ^2, J£:„- which are 
commonly called factors. The selection of prescribed values or levels of the 
factors constitutes a design. There are many possible designs discussed in 
the statistical literature. The one to be discussed below is the complete 
factorial design with two levels for each factor. For one factor as in Model 
3, the two levels are the lowest and highest permissible values of A',. 

A factor is termed quantitative when its possible levels can be ranked in 
magnitude or qualitative when they cannot. Examples of quantitative 
factors are temperature, pH, Reynolds number, and time. Examples of 
qualitative factors are the type of catalyst used and the presence or 
absence of a particular additive. 


6.4.2 Two-Level Factorial Design 
Consider the model 


>", = ,6o + /5|-^,i + /52.x,2+-- - + (6.4.1) 

where each x,j can assume only two values, termed “high” and “low.” 
There are then T possible combinations of factor levels which constitutes a 
complete 2' factorial design. 

Example 6.4.1 

In a certain chemical process the effects of three operating variables on the overall 
process yield are to be studied. These are temperature (x,i), type of catalyst (x, 2 ), 
and pressure (Xjj). Over the ranges of temperature and pressure studied it is known 
that the yield is linear in both temperature and pressure and thus two levels of each 
can be chosen. Also two different catalysts are chosen. The following levels are 
selected; 


Low Level High Level 

Temperature 400°K 420°K 

Type of catalyst A B 

Pressure 2.0xl0*N/m^ 2.2xlO®N/m^ 

Observe that two factors, temperature and pressure, are quantitative and the 
remaining factor, type of catalyst, is qualitative. A 2^ experimental design for this 
example is given in Table 6.5. When each factor is set at a certain level, and a test 
IS performed, the result is termed an experimental run. The runs are not necessarily 

per ormed in the order given; rather, the order of the actual tests should be 
random. 



254 CHAPTER « AUTBIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


Tabic 6 5 A 2^ Experimental Design for Example 6 4.1 


Run 

Number 

Temperature 

(K) 

Catalyst 

Type 

Pressure 

(N/m^ 

1 

400 

A 

20X10* 

2 

420 

A 

20X10* 

3 

400 

B 

20x10* 

4 

420 

B 

20X10* 

5 

400 

A 

22x10* 

6 

420 

A 

2.2x10* 

7 

400 

B 

22X10^ 

8 

420 

B 

22x10* 


6A3 Coding the Factors 

If X IS composed of column vectors that are orthogonal, X^X is a diagonal 
matrix Hence (he analysis can proceed in a similar manner as for orthego* 
nal polynomials A sec of orthogonal column vectors can be obtained by 
coding For quantitative factors the coding is typically given by 


Coded temp “2 


actual temp -0 5 (high temp + low temp) 
high temp -low temp 


which produces either + 1 or - I The qualitative factors are also assigned 
valtits of -i-l and - 1, for example, catalyst A could he signified by -• I 
and catalyst 5 by + 1 For the case of three factors such as in Example 
6 4 1 the X matrix for the 2^ factorial design can be given as 


X» 



(64 2) 


In terms of (6 4 1) and Example 64 1, the first column is a set of 
coefficients of Pq the second for coded temperature, the third for the 
coded catalyst type, and the last for coded pressure 



6.4 FACTORIAL EXPERIMENTS 


255 


Note that X given by (6.4.2) contains orthogonal columns. This matrix 
can be conceived of as representing a particular design for a two-level 
complete factorial design for three factors. For one factor the upper left 
2x2 matrix in X is the X matrix for a complete factorial design. For a 
two-factor, two-level complete factorial design the design matrix, if interac- 
tions are assumed zero, is the upper left 4x3 matrix. For more than three 
factors, the pattern of the elements of X in (6.4.2) can be repeated. For 
example, if there are four factors a fifth column would be added which 
would be a vector of eight - Ts followed by eight -t- Ts; the second 
column would continue the —1, 1, —1, 1 pattern for a total of 2'*= 16 
terms, etc. 


6.4.4 Inclusion of Interaction Terms in the Model 

The complete I' factorial design can be used in cases for which interaction 
or cross product terms are included in the regression model. An example is 

y. = /^o+ + ^2^,2 + 2 + E; (6-4.3) 

where is an interaction term. The X matrix remains orthogonal. 

For the model given by (6.4.3), the design matrix X can be 


1 -I 
1 1 
1 -1 
1 1 


-1 1 

-1 -1 

1 -1 

1 1 


(6.4.4) 


where the fourth column contains the products x,,x, 2 ; x,, is given in the 
second column and x ,2 in the third. 

In such a design as indicated by (6.4.4) there are four observations and 
four parameters and thus the predicted values 7, exactly equal 7,. Hence 
no estimate of can be obtained. If each run is replicated the same 
number of times, then can be estimated. One should be careful to 
perform the experiments in a randomized order. 


6.4.5 Estimation 

We have designed the orthogonal nature of X for use with ordinary least 
squares. The OLS estimator is given by 

6ols = (X^X) X^Y (6.4.5) 

Since X IS orthogonal and contains only + 1 or - 1 terms, the inverse of 



256 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 

X^X vs easily found Suppose tluit there are four parameters (three factors) 
so that X IS given by (6 4 2) and then 

X^X=diag [8 8 8 8] = 8I 

(X'-Xj-'-il 


Example 642 

Estimate the parameieis using OLS for the cxpenmentat design given in Table 6 5 
and the Yj values given by 

Y^=[49 62 44 5« 42 73 35 69) 

where Y is m percent 

(а) Estimate the parameters in a linear model permitting interaction using OLS 

(б) Suppose that the standard assumptions designated 111JI01J apply (These 
include normal independent, constant variance errors ) Also assume that an esti 
mate of the variance of Y is 5 which we shall consider to have come from a 
separate expenmcni and to have 20 degrees ol freedom Using ihe F statistic f nd a 

parsimonious (in terms of the parameters) model 

Solution 

(a) The complete model including all (he interaction terms can be written m terms 
of the coded factors Zj as 

+ 04ljS,4 

where e I and r j correspond respectively to coded temperature catalyst and 
pressure The X matrix then becomes 

Mam effecis Interaction 



which contains eight orthogonal columns TTie inverse of X^X is (I /8)I and the 
X^\ terms are obtained simply by talcing sums of the Y terms each multiplied by a 



6.4 FACTORIAL EXPERIMENTS 


257 


plus or minus one as indicated in X^. The resulting predicted equation is 
7, = 54 + 1 1 .5a,i - 2.5 z ,2 + 0.75z,3 + 0.5z, iz,2 + 4.75z, ,z,3 
-0.25z,2Z,3 + 0.25z,,Z,22,3 

Since there are eight observations and eight parameters, the calculated 7, values 
and the measured values 7, are equal; hence the sum of residuals is zero. 

(6) The effect of each term in the model can be examined independently 
because the design is orthogonal. This is an important benefit of such designs. 
From (6.2. 14a, b) the reduction in the sum of squares is given by 

b^X^Y = b^X^Xb = 8b^Ib=8 ^ bl 

where X^X = 8I is used. The sum Y^Y is 24624. The residual sum of squares {R), 
reduction in R, and the F statistic are given in Table 6.6. The F statistic is (Ai?/1) 
divided by s^(=2.5). At the 5% level of significance, F 95(1, 20) = 4.35. (s^ was stated 
to have 20 degrees of freedom.) We have an indication that only the first, second, 
third, and sixth parameters are needed since only for these is F>4.35. The first 
parameter is /Sq, the second is for coded temperature, the third is for the catalyst, 


Table 6.6 Sum of Squares and F ratio for Example 6.4,2 


No. of 
Parameters 

R, Residual 

Sum of Squares 

AR 

F 

1 

1296 

23328 

9331.2 

2 

238 

1058 

423.2 

3 

188 

50 

20 

4 

183.5 

4.5 

1.8 

5 

181.5 

2 

0.8 

6 

1 

180.5 

72.2 

7 

0.5 

0.5 

0.2 

8 

0 

0.5 

0.2 


and the sixth is for the interaction between temperature and pressure. Since the 
pressure enters through the interaction term, we also include the linear pressure 
term (fourth parameter). We assume that these provide a parsimonious set of 
parameters. The model then is 

7, = 54 + 1 1 .5z,i - 2.5z,2+ 0.75z,3 + 4.75z,,z,3 
or, m terms of the original factors, 

7 = 54+I1.5l:^-2.5C+0.75^^i:2J^^4^3 7:-^,^^ 

10 10^ 10 IQS 



258 aiAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


where T is temperature in K, - 1 for catalyst A, C=» 1 for catalyst B, and P is 
pressure in N /tn^ This expression can be sunpUfied to 

r- 3656 5-8 825r-25C-000194P+4 75xl0-'^P 
The predicted values of 9, are ^vcn m Table 6 7 along with the residuals 


Table 6 7 Predicted Values and Residuals (or Example 64 2 


Run No 

Observed 

n(%) 

Predicted 

Residual 
r,- Y,(%) 

1 

49 

49 

0 

2 

62 

62 5 

-05 

3 

44 

44 

0 

4 

58 

57 5 

05 

5 

42 

41 

1 0 

6 

73 

73 5 

-05 

7 

35 

36 

-1 

8 

69 

68 5 

05 


6 4.6 Importance of Replicates 

In the above example the significance of each term m the fitted model was 
tested using an “external'’ estimate of the pure error variance A superior 
estimate of the pure variance may be obtained by replicating some or all of 
the runs In order to preserve the orthogonal character of a 2' design it is 
necessary to repeat all the 2' runs the same number of times It is 
tmportam, however, that the replicated run is a genuine repeat Repeatedly 
measunng the same response does not constitute a genuine replication It 
IS better to interpose at least one change in a factor level before a run is 
repeated to obtain an independent response A random choice of the order 
m which the runs arc perfonned is important in this connection 


6.4.7 Other Experiment Designs 

Many other expenmental designs are possible in addition to a complete 
two-levcl factorial For example, rf there are two factors, and one is desired 
at two levels and the other at three, then we would use a 2X3 factorial 
design The X^X malnx can also be made diagonal for this case 
Other possible designs are made up of treatments which are selected 
from among the treatments of a complete factonal design the selection 
being made in such a way that the possibilities of eliminating certain 
parameters are tested in a systematic manner which preserves the possibil 
ity of making the remaining tests [14] 



6.5 MAXIMUM LIKELIHOOD ESTIMATOR 


259 


6.5 MAXIMUM LIKELIHOOD ESTIMATOR 

The treatment of the maximum likelihood (ML) method by some authors 
leaves the impression that the ML method applies only to cases of 
noncorrelated errors. In the ML method discussed below, however, corre- 
lated errors may be present. Also the errors may have nonconstant vari- 
ance. With only these two exceptions, the standard assumptions are used; 
these are denoted 11-1111. Note that the errors are assumed to have a 
normal probability density. It is noteworthy that ML estimation assumes a 
great deal of information regarding e. This is in contrast to OLS estima- 
tion. 


6.5.1 ML Estimation 

For the above assumptions the ML parameter estimator is derived by 
minimizing the ML loss function given by (6.1.67b). For the linear model 
t] = X/3 we then seek to minimize 

SML = (Y-Xj3)^^-'(Y-Xj3) (6.5.1) 

with respect to the parameter vector /3. Using (6.1.26b) and (6.1.30) as in 
Section 6.2.1 results in 


bML = (XV''X) 


(6.5.2) 


By using Y — XP + e in (6.5.2) and taking the expected value of bj^L 

£(Kl) = P (6.5.3) 

^ML is unbiased estimator of fi. The covariance matrix of can be 
found as in Section 6.2.3. Let 


6ml=MX/ 3 + £) where /I ^(X^j/' 'X) (6.5.4) 

Then from (6.2.10) the covariance of b^L is 

cov(bML) = A^A^= [{X^-'xy 'X^- 'X)“ * ] 


cov(bML) = (X^V'-'X) ' = Pml 


(6.5.5) 


Observe that the maximum likelihood estimator is exactly the same as 
given by the Gauss-Markov theorem (Section 6.1.7). Thus the maximum 




260 CIUPTER 6 MATRIX ANALYSIS FOR UNEAR PARAMETER ESTIMATION 


likelihood method produces a minimum variance unbiased linear estimator 
for the linear model and the standard assumptions given The only addi 
tional assumption required by ML estimation and not by the Gauss-Mar 
kov theorem n the knowledge that the enors have a normal probability 
density 

The covariance of a collection of predicted points on the regression line 
or surface represented by Y— X,b^L is obtained in the same manner, as is 
(6 2 12a), 

ctiv(Y) - X,(X'f- 'X) - 'Xf- X, P„,xr (656) 


This expression simplifies to ihai given by OLS (6 2 12b). if e’l 
The difference between LS and ML estimators given by (62 5) and 
(6 5 2) IS clearly the presence of the mainx m (6 5 2) but not in (625) 
(Recall that is the inverse of the covariance matrix of the measuie* 
ments errors) If the estimators are exactly the same When ^ 

deviates considerably from the condition the LS and ML estimates can 
be quite different The LS estimators can have vanances that are arbi 
tranly larger than those given by maximum likelihood One case in which 
this can occur is when 


if ‘-diag[<i, o'*] (657) 

and the of terms are quite disparate in magmludes such as the set of values 
of 0 1, 1, 10, 100, 1000 and so on 

Example 6,5 1 

Foe ' given by (6 5 7) derive an expression for the ratio of ihe variance of to 
the variance of for the model ii, = 0X and simplify Ihe result for X,= I 

Solution 

From ( 6 ^ 11) the vanance of 6^5 is found to be 

i-y/J 





6.5 MAXIMUM LIKELIHOOD ESTIMATOR 


261 


Then the ratio is 


F(hLs) 

F(Vl) 



i XioA 

* = i / 



-2 


which for X,- = 1 reduces to 


^ (^Ls) 



Example 6.5.2 

Investigate the solution in Example 6.5.1 for = ^ and 0 ] = 1, a2~10, a3=100, 
CT 4 = 1000, etc. 

Solution 

For n = 2, the ratio is (1/2)^ (101) (1.01) = 25.5. For /j = 3, it is (1/3)^ (10101) 
(1.0101)= 1 133.7. In general the approximate result is 10^*''”'V(«- 1)^ which 
increases rapidly with n. Hence in such cases the ML estimate is far superior to the 
OLS estimate. 


Example 6.5.3 

Investigate the solution of Example 6.5.1 for X,= 1, 2, 3, and 4 and for a,= 1, 10, 
100, and 1000 respectively. 


Solution 

The ratio for n= 1, 2, 3, and 4 are respectively 1, 16.68, 480.1, and 18610. The ratio 
does not increase as rapidly for the model tj = /S but still indicates the superiority of 
maximum likelihood estimation compared with ordinary least squares for unequal 
error variances. 

If the measurement errors are correlated, the ;|/ matrix is not diagonal. 
Some cases that produce nondiagonal ^ matrices are those involving 
autoregressive and moving average errors. See Sections 5.13 and 6.9. 
Typical terms in and X’i|'~'Y for the single response case are 




2 2 


k=l /=1 


2 2 

k =\ /=1 


, ij=\,2,...,p (6.5.8a) 

, i=\,...,p (6.5.8b) 



262 CIUPTER 6 MATRIX ANALYSIS FOR UNEAR PARAMETER ESHMATION 


where is the kl component of ^ ‘ If ^ 'is given by (6 5 7) the double 
summations above can be replaced by single summations as m 





«j=l,2, ,p 


(6 59a) 


1 = 1.2. .p 


(6 59b) 


65 2 Estimation of 

One advantage of the ML formulation is (hat it can provide a direct 
method for evaluating certain “statistical” parameters such as the variance 
of the errors For example, assume that the covariance matrix is known 
except for the multiplicative constant (assumptions 1 l^lOl 1), 

(65 10) 

where 12 is completely known but is unknown We start the analysis 
with the loganthm of the likelihood function as given by (6 1 67a) with 
n-N) 

ln/,(ll|Y)-- j[nln2.+lii(<r“*|!!|)+S„t] (65 ll.) 

where 

’(Y-i])^n '(V-n) (65 11b) 

Take the derivative with respect to o* and with respect to the parameters 
/?,, ,Pi, for the linear model »j = Xp Set 2 nd 

setting the denvatives equal to zero gives 

l2”e»n. 

where 

5!Si(bML)“(V-Y)'«''(Y-V). V=Xb„^ (65 14) 


6.5 MAXIMUM LIKELIHOOD ESTIMATOR 

Note that the physical or structural parameters, p, can be estimated 
directly from (6.5.13) without knowledge of a^. Using these estimated 
values permits the estimation of from (6.5.12) as 

( 6 . 5 . 15 ) 

An advantage of the maximum likelihood method is that it can provide a 
direct method for estimating a^. The estimator is unfortunately biased, 
however. An unbiased estimator for is 

■5'^= (6.5.16) 

See Section 6.8.3 for a derivation. Since (6.5.16) is unbiased, it is recom- 
mended for estimating when ^ A summary of the basic ML 

estimation equations is given in Appendix B. 

Theorem 6.2.4 regarding an F distribution can also be stated for the ML 
assumptions 11—1011 for a linear model. The statistic 

s^q 

has the F^_^(q,n—p) distribution if P 2 ~^ 2 - This result can be used to 
build “parsimonious” models. 


Example 6.5.4 

The thermal conductivity, k, of Armco iron has been measured using a new 
method discussed in Beck and Al-Araji [15], Temperatures between 100 and 362°F 
(311 and 456 K) were covered along with power inputs between 272 and 602 W as 
given in Table 6.8. The temperature and power can be considered to be measured 
so much more accurately than in the measured k that the error will be assumed to 
be in the k measurement only. Also it is reasonable to assume additive errors. 
Comparison of values of measurements from different laboratories frequently 
indicate nonzero mean for the errors. Nevertheless a zero mean value for e, is 
assumed. 

Each Y, value given in the last column of Table 6.8 is the average of four values 
of the conductivity. The values are given to varying significant figures because the 
original values were given to four figures and sometimes the average yielded four 
digits and other times more. Notice that there are approximately two power levels, 
about 275 and 550. The estimated standard deviations found using the four 
observations for each P and T for the smaller powers is about 0.278 whereas for the 
higher power the estimated standard deviation is 0.16, smaller by a factor of about 
2. Hence the variance for the even-numbered runs of Table 6.8 is about four times 
those for the odd-numbered ones which are for the larger powers. 



264 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


Table 6 8 Data for Thermal Conductivity Measurements of 


Armco Iron for Evampfe 6.5^ 


Run 

No 

Temp 

(”F) 

Power 

(W) 

Measured Thermal 
Conductivity (Blu/hr ft-‘F) 

1 

100 

545 

4160 

2 

90 

276 

42 345 

3 

161 

602 

37 7875 

4 

)49 

275 

39 5375 

5 

227 

538 

36 4975 

6 

206 

274 

37 3525 

7 

270 

550 

35 785 

8 

247 

274 

36 36 

9 

362 

522 

34 53 

10 

352 

272 

33 915 


The objective is to find a parsimonious model of thermal conductivity versus 
temperature T and power P using maximum likelihood The errors t, are assumed 
to be independent and to have a normal probability density 

Solution 

The assumptions mentioned above of additive zero mean independent, and 
normal errors in Y, but errorless T, and P can be designated llOllOU We can 
write the covariance matnx ^ as 

^•■o^diagll 4 I <1 I 4 J 4 I 4) 

where o’ is unknown In terms of the parameter estimates the statistical parameter 
o’ need not be known An estimate ol o’ is given by using the standard deviations 
values quoted in the example 

A plot of the data is shown by Fig 6 3 It appears that at least a second degree 
curve in T may be required and that P may not be needed From physical 
considerations it is also expected that k is not a function of P There are many 
possible models and a number of ways of proceeding Some possible models are as 
follows 
1 . 

2 . k = p^ + PiT 

3. + 

4 k^Pt + PiT+piP 

5. 

6 k^’p.+PiT+PjT^ + P^T^ 

One way to build the model is to start with the simplest and add one term at a 
time As one progresses the seed of adding terms can be assessed by examining the 



6.5 MAXIMUM LIKELIHOOD ESTIMATOR 


265 


Temperature, K 



Figure 6.3 Thermal conductivities measured for Armco iron and used in Example 6.5.4. 


reduction in the value and by utilizing the F test based on Theorem 6.2.4. 

With if< as given above the parameters were estimated using (6.5.2) with the 
components given by (6.5.9). The mean square, is found from (6.5.16). The 
results of the calculations including the F statistic ( = A/?/s^) are tabulated in 
Table 6.9. These F values can be compared with those given by F^^(\,n—p) which 
are 5.99, 5.59, 5.32, and 5.12 for n—p = 6, 7, 8, and 9, respectively. From a 
comparison of these F values with those in Table 6.9, it appears that the power 
factor, F,, need not be used. Also Theorem 6.2.4 leads us to select Model 5 because 
Model 6 has an unnecessary additional term since F= 5.729 < 5.99 = F95(1, 6). 
However, since these values of 5.729 and 5.99 are so close, it might be that another 


Table 6.9 Results of Calculations of R and F for Example 6.5.4 


Model 

No. 

No. of 
Parameters 

Degrees of 
Freedom 

p 

^ML 

Mean 

Square, 

AF" 

F 

1 

1 

9 

40.0078 

4.445 

8769.36 

1973 

2 

2 

8 

4.47506 

0.5594 

35.5327 

63.52 

3 

2 

8 

39.91282 

4.9891 

0.09495 

0.01903 

4 

3 

7 

4.18133 

0.5973 

0.29373 

0.4918 

5 

3 

7 

1.13092 

0.1616 

3.34413 

20.70 

6 

4 

6 

0.57852 

0.09642 

0.55241 

5.729 

^ML.4’ 

etc. 

AF2 = Fml.i 

— 







266 CHAlTtR 6 AUTRDC ANALYSIS FOR UNEAR PARAMETER ESTIMATION 


experiment or a larger sample size might show significance, that is indicate Model 
6 in preference to Model 5 

For Model 5 the regression equation for k is (in English units) 

« 47 5 1 9 1 -0 07084 7’+ 9 669x 1 0 

and the estimated covariance matrix of the estimated parameter vector is 


est cov(b^,L)■ 


6 741 

symmetric 


-6 133X10-^ 
6080X10** 


1 242X10*' 
-1 282X10** 

2796x10*’, 


where 1616 The vananeesof 6, *iMt* A”*! ml the diagonal terms 
The estimated standard errors (std dev ) for them arc 


est se(6| ml)-( 0 1616(6 74|)l''’- 1 04 

es‘ s e (*j ML>- [0 16I6(6080X IO-*)]'^*-.0 00991 

est se(6jMO“2l3xl0*’ 


The sizes of the estimated standard erron of the b reveal the relative 

importance of the terms m describing k The estimated standard error of h| ml 
1 (M, whereas one standard error m bj mi. gives a value of 2 6 for the term of 1^ 
for Hence the effects on k due to uncertainty m 6, ml ^i ml Are 

about the same 

Example 6,5,5 

Find the estimated standard error in Y, for the data of Example 6 5 4 
Solution 

The csumated standard error of Y, is found by taking the square root of the 
diagonal terms of (6 5 6) with P^l replaced by est cov(bML) '"hwh is displayed ;ust 
above For arbitrary T, an expression for est VfY,) the ith diagonal term of est 
cov(y,), is 

est V(Y,) = [f‘n + 2T,Pt2+T/(Ph+2f‘Ti)*2T,^P;i+ 

where P^ = P,j/o^ From above, note that a typical PJ is P^| = 674I Also s^= 
0 1616 The est sefT^) which is the square root of est y(Y,) is displayed as the 
curve in Fig 6 4 



Residual, Y. - Y.. (Btu/hr-ft-F) 


6.5 MAXIMUM LIKELIHOOD ESTIMATOR 


267 


Temperature, K 

300 350 400 450 



0 100 200 300 400 


Temperature, °F 

Figure 6.4 Residuals of thermal conductivity values for Example 6.5.4. 

6.5.3 Expected Values of and 

The expected value of S'ml (parameters not estimated) given by (6.5.1) 
using (6.1.50) can be written with ^ being known as 

= tr[t|/“‘cove] = tr[»|/"’4'] = n (6.5.18) 

where n is the number of observations. 

In finding E let 

^ML = (Y-XbML)"'t//-'(y-XbML) (6.5.19) 

and use A defined in (6.5.4). Then (6.5.19) can be written as 

^ML = Y^(I - XA) ' (I - XA)Y = ' (I - XA)Y (6.5.20a) 

since 

X^V' ~ ' (I - XA) = X^,/. - ' [ I - X(X 'X) “ 'X"',/. - ' ] = 0 (6.5.20b) 

Introducing Y = X/3 + e in (6.5.20a) yields 


(6.5.21) 




268 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


where (6 5 20b) and the following relation are used 

(l-XA)X-{l-X(X'*-'X)"'x'V,-')X-0 (6 5 22) 

From (65 21), given by 

«ML-«'''*-'''[l-*''"’X(X’'+-'X)''xV (6 523) 

Then taking the expected value of (6 5 23) and utilizing (6 1 50) we find 
^(«ML)-lr([l''f“‘'"X(XV-'X)''x’;f''63]c<,v(*-We)) 


- lr( ; ) - lr[(X ' 'X)" '(Xni, - 'X) ] 

(6S14) 

The above expression for £(f?„L) can be utilized to cheek the validity of 
the model and the ^ matrix It is important to have a check because in 
many cases there is same uncerumty m tj or tn For the assumptions 
used above (ll-llll), has a distribution with n-p degrees of 
freedom this then can provide a statistical test Table 2 14 gives the xf 
value for which P(.x^<X\) t* * specified value For example there is a 
column of values of for which the probability of x^ being less than xf is 
equal to 95, or 


F(x'<Xi)“ 95 (6525) 


Example 6 56 

For a second set of data similai to those given in Example 6 5 4 ML estimation 
has been used There are 10 separate measutements of the thermal conducliviiy as 
a function of temperature and power input The mode) used is the one found from 
the preceding analysis. 


and the errors in k are assumed to be tndependent and the •i matrix based on the 
results of Example 6 5 4 is 

^(^^60 IbdiagCl 4 14 14 14 14] 

For the analysis based on these data this mode} and ^ yielded ^, 1 ,^ = 13 52 
Assume that the standard assumptions indicated by IIOIIIII are valid and 
investigate the “goodness of fit ’ using die x* disinbulion 



6.6 LINEAR MAXIMUM A POSTERIORI ESTIMATOR (MAP) 


269 


Solution 

Since there are 10 observations (n=10) and three parameters, n—p = l degrees of 
freedom for which a table of the distribution gives 

P(X^< 14.07) = .95, P(x^<2.167) = .05 

This means that only in 5% of similar cases would exceed 14.07. Also 

only in 5% of the cases would be less than 2.107. Since 2.167 < 13.52< 14.07, 
we have an indication at the 10% level of significance that the model and ip are 
satisfactory. 

An important source of error which could cause 1° t)e either too 
small or too large (compared to an interval based in the x^ distribution) is 
an incorrect choice of ip. For example, ip might be assumed to be diagonal 
while the measurement errors are correlated causing ip to be nondiagonal. 
For this reason it is always advisable to investigate if the residuals suggest 
correlation among the errors. Such correlation can be inherent in the 
measurements themselves or a result of selection of an incorrect model. 

Often the model is dictated by physical considerations. In other cases 
there may be several models that fit equally well. 


6,6 LINEAR MAXIMUM A POSTERIORI ESTIMATOR (MAP) 

6.6.1 Introduction 

Maximum a posteriori estimation utilizes prior information regarding the 
parameters in addition to information regarding the measurement errors. 
Inclusion of prior parameter information can have the beneficial effect of 
reduction of variances of parameter estimators. 

In Section 5.4 two MAP cases are considered. The first is for random 
parameters. One convenient way to visualize this case is to think of some 
product that is produced in “batches,” each of which is different. The prior 
information is relative to the mean and variance of these batches. The 
measurements Y, however, deal with a specific (new) batch. 

In the second MAP case, the information is visualized as coming from 
subjective information — belief of an investigator regarding the parameters. 
This could include mean and variances. In this case the parameters are not 
random, for example, a parameter of interest might be a fundamental 
constant such as the speed of light. The knowledge about the parameters is 
probabilistic. This probabilistic information leads us to view the parame- 
ters as having probability distributions, similar to the random parameter 
case. 



270 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTTXUTJON 


In both cases or views the denved par«netet estimators are formally the 
same but the meaning of the vanous tenns may be different In order to be 
succinct the case of random parametm is treated first and the results for 
subjective information are given without derivations 

6 6.2 Assumptions 

Let us consider the case of MAP estimation involving a random parameter 
vector We shall investigate estimation for the standard assumptions de- 
noted 11-1112 Each assumption except for the last is the same as used for 
ML estimation in Section 65 Maihematicaliy the assumptions can be 
given as 

Y=£(Y|/J) + e. c-iV(0.^) (6 6l3.b) 

cov(^«)«0 (66Jc,de) 

where i/-. pj, and are known Note that the distributions for both fi, the 
random parameter column vector, and « ate norma! 


6,6.3 Estimation Involving Random Parameters 

In Map estimation the estimated parameter vector is the one that maxi- 
mizes the probability density /(Vf^) This density is related to/(/J|Y) and 
that for the random parameter. /(fiK by 


/(P|Y)- 


/(yiP)/(fi) 

/(Y> 


(6 6 2 ) 


which is a form of Bayes’s theorem Recall that /(Y|p) is the same density 
used in maximum likelihood estimation, for the given assumptions n is 
given by (6 1 66) For the conditions mdicated by (6 6 Id) /( P) is 




where p is the number of parameters The probability density /(Y) need 
not be explicitly given since it is not a function of ^ 

The maximum of/(PlV) given by (662) occurs at the same parameter 
•«?f cres ws ic/fj vIta oi naJMsaJ. 

ln[/( PlY)] = - I [(fl+p)ln2w+ Inl+i + InlVgl+ (6 642) 

+ (664b) 



6.6 LINEAR MAXIMUM A POSTERIORI ESTIMATOR (MAP) 


271 


When the only parameters of interest occur in J8, not in for example, 
maximizing /(j8 1 Y) can be accomplished by minimizing ^'^ap- 
Following the usual procedure of taking the matrix derivatives with 
respect to /3, etc., one finds using (6.6.4b), (6.1.33), and (6.1.26b), 

VMAplb„,p=2[-X^;|,-'Y + X^,/--'XbMAP-V^-'/i^ + V^-%Ap]=0 

(6.6.5) 


Solving for the estimator yields 

*^MAP~^MAp[^^’^' V/sjj f*MAP=^^’/' "I" (6.6.6a, b) 

Notice the definition of P^iap given by (6.6.6b). By adding and subtracting 
from (6.6.5), the additional expression of 


i’MAp — M/j'b-^MAP^^’/' '(Y 


(6.6.6c) 


can be given for b^AP- in this form the second term on the right side can 
be considered as a correction to the known mean value of the random 
parameter vector it is a result of the new information, Y, for a given 
“batch.” For the case of Model 1, t), = )3,X,, (6.6.6a, b,c) reduces to 
(5.4.11a,b). Since Y==X/3 + e, = and £'(c) = 0, use of (6.6.6c) 
shows that the expected value of bj^^p is 


^ (i’MAp) ~ (6.6.7) 

and thus b^^j^p is a biased estimator. Even so it is recommended whenever 
appropriate owing to the reduced error covariance matrix as shown below. 
An example of MAP estimation is given in Section 6.7.4 in connection with 
the sequential MAP method. 

The covariance matrix of interest is that of as explained in 

connection with (5.4.12) and 5.4.13); b^^^p — j8 is the difference between 
the estimator and the parameter vector for the particular batch. Utilizing 
(6.6.6a) we can write 


biviAP P-Pmap^^'P '(X^ + «)-/3 + PMApV^‘Fa (6.6.8a) 

= (Pm ApX^>P ~ 'X - 1) /3 + PmapX^i/' “ 'c + Pmap^^ (6.6.8b) 

Now taking the covariance of bj^^p-p, the error covariance matrix of 




272 CHAPTER 6 \UTBIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


^MAP- yields 

COV(b^,^p “ /J ) * (P MAP^V MApX V 'X — I) 


+PmapX’'i^-V^-'XP„^p (669) 

where (6 ! 40) is used Expanding the nght side of (6 6 9) produces 
™v(b„„ - p ) - P„^pX’i( - 'XVjX V- 'XP„„- P„*,X V- 'XV, 

-V,X'i.-'XP„„+V,+P„„XV-'XP„„ (66 lOa) 

--V,XV''XP„„+V, 

- - V,XV' 'XP„„+ V,P:,i,P„„ (6 6 10b) 

“* Vj)[ — X^i^ 'X + Pma, ]Pmap 

-v,[-xV-'x+x’<.-'x+v;']p„„ C 6 6I0C) 

and thus the covariance of 


cov(b„ AP - P ) - P„AP “ ( X V “X + v^- ' ] ' 


(6611) 


This IS valid for the stated assumptions which are denoted U'*IU2 Note 
that the effect of the random parameter behavior disappears as XV 'X is 
made large, which usually results from a large number of measurements 
A summaiy of the MAP estimation equations is given in Appendix B 


664 Estimation with Subjective Information 

Consider now the case of constant parameters with subjective informaiion 
about them This information can be expressed in probabilistic terms if we 
imagine that the parameters have probability distributions, that is, are 
random A close analogue can be <^wn with the random parameter case 
Now fi.p becomes the pnor parameter vector based on belief and is the 
covanance matrix for a normal distribution That is, our knowledge or 
belief regarding fi is expressed by the probability density, /(/3) = 
conceive that this knowledge has been developed from 
investigation of many tests on similar “batches,” from literature values or 
from other sources This information is independent of that obtained from 
the new information contained m Y With these assumptions and those 




6.6 LINEAR MAXIMUM A POSTERIORI ESTIMATOR (MAP) 


273 


used regarding Y above, the assumptions are denoted 11--1113. The 
estimator and covariance matrix are then the same as derived in Section 
6.6.3. Now, however, and have different meanings. 

Let us briefly consider certain implications of the MAP estimators for 
subjective prior information. Suppose first that the prior information is 
very poor. This implies that the matrix has large diagonal components. 
If has large diagonal components, given by (6.6.6a) approaches 

the b^jL expression given by (6.5.2) and cov(b^^p — ^) approaches 
cov(bML) given by (6.5.5). Hence a computer program developed for MAP 
estimation could be also used for ML estimation (for the assumptions 
denoted 11—1111). The same program could be also used for OLS estima- 
tion by letting have large diagonal components and by replacing ^ by 
aH. If rp were equal to a^I and all the standard assumptions were valid, we 
would have 

If the standard assumptions are not valid, the same estimator for OLS 
given by (6.2.5) is obtained from (6.6.6) by simply replacing i/'"' by I and 
by setting fi^ = 0 and ‘ =0; in this case, cov(b^^AP“ 
correct relation for cov(bLs), however. 

In (6.6.6) all components of b^AP can be uniquely found if 

|PMip| = |X^’Z'"'X+V^-V0 (6.6.12) 

Hence for MAP estimations it is neither necessary that |X^X|7^0 nor that 
n>p. In both ML and LS estimation, j8 cannot be estimated if |X^X| = 0 
which is the case if n<p or if there is linear dependence among the 
sensitivity coefficients. The fact that b^^AP estimates can be obtained for n 
as small as 1 is used in the sequential method discussed in the next section. 

6.6.5 Uncertainty in :p 

One major difficulty in using ML and MAP estimation is that the standard 
assumptions may not be valid. In particular, the ip matrix may not be 
known. There are certain checks and corrections that can be used when 
there is some uncertainty in ip. 

One check involves the expected value of which is called the prior 
value because the new data are not yet used to obtain parameter estimates. 
Using (6.1.50), £(S'^|^p) becomes 

^ ) = tr[ >1'“ *>/'] + tr[V^ ‘V^] = « -h/7 (6.6. 13) 

Now we also know for ML that = n and E{R^^ = n-p. Hence it 



274 aUPTER 6 MATRIX ANALYSIS FOR UNEAR PARAMETER ESTIMATION 


IS reasonable to assume that the expected value of l^e minimum 

value of IS about equal to n Again, if n'»p, the difference between 
rt + p and n IS relatively small Hence for many cases 
considered to have a distribution with n degrees of freedom (This is 
true provided the assumptions designated II~11I2 or 11-1113 are valid) 
A related example is given by Example 6 56 
Suppose next that there is uncertamty in let be equal to where 
IS unknown and fl is known Both and p can be estimated by 
maximizing /( P |Y) with respect to o* to find its MAP estimator and with 
respect to P to find the mode of the distnbution of the p After taking the 
derivative of lii/(P]V) given by (6 64) with respect to p and and setting 
the resulting equations equal to zero where P^bwAP 
obtain the set of p + 1 equations given by (6 6 I4a, b). 

^iip[-X’’n-'Y + X'8-'Xb„„)-V,-'pB + V,-'b„„-0 (66 I4a) 

(66 '■'b) 

CSAA-(V-XI>„„)’'(l-'(Y-Xb„„) (66 I4c) 

Unfortunately these equations are no longer linear in and 
Nevertheless we can solve them to obtain 

l>M.p-fj+[{)t''I!''X)+6i,„V;'j‘'x^B ’(Y-Xc,) (6615a) 

61iap-^ (6 6 15b) 

The nonlinearity has not been removed, but two different approaches are 
apparent from these equations First, if it happens that d^Ap''n ' 's known, 
the nonlinearity disappears and a direct solution is given for b^Ap ^nd 
^MAP might occur when two (or more) sets of data are analyzed 

separately but there is a common o* for all ihe data Second, an iterative 
procedure is suggested by (66 15) If an initial guess for aJ,Ap ts available, 
It IS used in (6 6 15a), and then an improved value of is found using 
(6 6 15b), whereupon this value is used m (6 6 15a), etc , until the changes 
in b^AP ®MAP negligible If the initial value of omap were zero, the 
first estimator for p would be bn,L At the other extreme of »co, 
’•’map^Mp Ynntrtfvt -piwriiBe, utftt -wnire -nurinx ■pniirn'i. 

namely, those in braces in (66 15), need be evaluated only once 
For another case with uncertain^ m <ee Problem 6 29, where i{> = 
and Vg = 0 ^ V with unknown and Q and V known 



6.7 SEQUENTIAL ESTIMATION 
6.7 SEQUENTIAL ESTIMATION 


275 


6.7.1 Introduction 

The sequential estimation procedures developed in this section refer to 
continually updating parameter estimates as new observations are added. 
One of the most important advantages of this method is that matrix 
inverses may not be needed. Another is that the computer memory storage 
can be greatly reduced. Moreover, the method can be utilized to produce 
an “on-line” method of parameter estimation for dynamic processes. These 
and other advantages are discussed further at the end of this section. 

The mathematical form derived for MAP estimation, (6.6.6), includes 
those derived for ML and OLS estimation. For ML estimation with the 
standard assumptions of 11—1111, (6.6.6) mathematically reduces to (6.5.2) 
if V^'^0. For the subjective prior information case, this corresponds to no 
prior information. In this case the value of fip is unimportant (provided 
V^'^^ = 0). For (6.6.6) to reduce to the estimator given for OLS estimation, 
(6.2.5), we may set = and V^' = 0 in (6.6.6). Whether or not these 
assumptions are valid, the OLS estimator is obtained. If the assumptions 
denoted 11111-11 need not be known) are valid, then the estimates 

obtained using will equal those given by b^.,L and boLs- ihe 

sequential procedure we use the fact the ML and OLS estimates can be 
very closely approximated as indicated above if the matrix is diagonal 
with large diagonal components. The sequential procedure also includes 
ML and OLS estimation when a set of data has been analyzed to estimate 
the parameters and then later this information is combined with more 
data; the information for the first set of data summarized by b^^L 
(X^j^'“'X)“* (or boLs and (X^X)“‘) can be mathematically treated in the 
same manner as and in MAP estimation. See Section 5.3.4. 

Two different sequential procedures are given. The first is the direct 
method; it involves matrix inverses of dimensions pXp. In the alternate, 
and recommended, formulation the inverses have dimensions m X m where 
tn is the number of responses at each “time.” In the case of a single 
response, m is equal to one and thus results in only scalar inverses being 
required. 

6.7.2 Direct Method 

Since the MAP estimator can mathematically include ML and OLS 
estimators and since estimates can be obtained for n as small as one, the 
sequential estimator given by (6.6.6) is used as a building block for the 
sequential method. 



J76 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


One important assumption for sequential MAP and ML estimation is 
that the measurements are independent in “lime ” That is, for the muliire- 
sponse case, ^ can be partitioned into the diagonal matrix 

^“sdiagf*!*, 4>2 • (^71) 

where ^,tsmXm and m is the number of observations taken at each time 
The measurements at each time may be correlated since d», need not be 
diagonal If ordinary least squares estimation is used, the measurement 
errors in Y may be correlated m time since in OLS estimation the matrix if 
m (6 6 6) is replaced by I, m this case Pmap would not yield the covariance 

of f*OLS 

Sequential MAP and ML estimation can be used when tf- is not given by 
(6"? 1) but a transformation of the measurements is necessary to produce 
pseudo-observations that are uncorrelated m time See Section 6 9 
A sequential estimator can be derived by letting 

P-^P,*, 


Vg-*P,, X-*X,+ ,. 

(6 72) 

and introducing into (6 6 $) to find 


b, , , - b. p.* .x,": ,<b,V, [ y. * , - X, * ,b, ] 

(67 3) 


(67 4) 


The i subscript refers to “time” (or whatever the independent variable in 
terms of which measurements are being added) Thus b, + i « an estimator 
for all p parameters based on the data Yi.Y^ .V,^i as well as on the 
pnor information, if any In the sequential procedure (6 7 3,4) are used for 
,n In the above equation X,*, is an m matnx and Y,^, is an 
m X 1 vector In order to use (he above formulation, it is necessary to invert 
the/iXp matnx P,^, and the mXm matrix d>,+ , at each tune 

6.73 Sequential Method Using Matrix Inversion Lemma 

The labor in finding the inverses of the matrices in (6 7 4) can be reduced 
tf m<p by using the matru identities 

i “ 1^,1 , + P,' T ’ 

= P, - ,{X,*,P.X/; , -b 'X, + ,P, (6 7 5a) 

(67 5b) 



6.7 SEQUENTIAL ESTIMATION 


277 


See Appendix 6B for a derivation of these equations; (6.7.5a) is known as 
the matrix inversion lemma. Note that even though P,+i is a.p'Xp matrix, 
the matrix that must be inverted on the right sides of (6.7.5a, b) is mX m. 
By introducing (6.7.5) into (6.7.3, 4) we obtain 


K 


i+i 


A + i 


I + I 


— P 

(6.7.6a) 

= ^l+l'*’^l+l''^l+l 

(6.7.6b) 


(6.7.6c) 

= (Y,,,-X,„b,) 

(6.7.6d) 

“ ■*■^1+ l®l+ I 

(6.7.6e) 

= P,-K,.„A,+i 

(6.7.6f) 


where is sometimes called the gain matrix. This gives a general 
sequential procedure that can be used for OLS, WLS, Gauss-Markov, ML, 
and MAP estimation. The same computer program can be used for each. 

Parenthetically we note that the same computer program can also 
provide a. filter. That is, the .estimator b,+, can be used to find the best 
estimate of Y,^,, designated Y, + ,, based on all the data until and including 
time i + l. Notice that Y,+, = X,+,b,+, is not the same vector as would be 
obtained from using all the data (i=f,2,...,n) to evaluate b; when we use 
all the data, as we usually do, the Y, values are termed smoothed values 
rather than filtered values. 

In starting the sequential procedure given by (6.7.6) the bp and Pq 
matrices are required. For MAP estimation bp is p,p and 'Pq = 'W p. For ML 
and OLS estimation bo may be set equal to a zero column vector and Pq is 
made to be a diagonal matrix; the yth diagonal term of Pq should be large 
compared with „. The Pg matrix for the ML and OLS cases is discussed 
further below. WLS and Gauss-Markov estimates are obtained in a similar 
way. 

Another expression for P,^, given by Mendel [16, p. 128] is 


P,+, = [I-K,^,X,^,]P,[I-K,^,X,^,]^ 

+ (6.7.7) 

This expression can be shown to be equal to that given by (6.7.6f) by 
introducing the definitions of K,+ , and ,. It is true that (6.7.7) is a more 
tme-consuming expression to evaluate than (6.7.6f), but Mendel shows 
that it is less sensitive to propagation of errors in K than is (6.7.6f). 



278 CHAPTER 6 MATRIX ANALYSIS FOR UNEAR PARAMETER ESTLVUTtON 


6.7.3. 1 Estimation yvith Only One Observation at Each Time 1) 

An important simplification occurs m the sequential form given (676) 
when there is a single observation at each time This is because A + , is a 
scalar and thus its inverse is a scalar Also note that 

x*,,l 

where tsf+^ is the variance of for ML and MAP estimation, but is 
replaced by unity for OLS estimation 
The sequential procedure for m= I miplied by (6 7 6) is 


(6 7 8a) 


(67 8b) 

(67 8c) 

(6 7 8d) 

(6 7 8e) 
(67 8f) 



where u= 1.2. ,p It is important to observe that there are no simulta- 
neous equations to solve or nonscalar matrices to invert with this method 
This IS a somewhat surprising result and it is true for any value of p>\ 
This procedure does require starling values for b and P, however 

Example 6.7.1 

Give a set of equations based on (6 7 8) for two parameters that is appropriate for a 
small programmable calculator Also mdicate the memory locations 

Solution 

Before the calculations values can be stored lor bi.bj Pn P12. o^X| andXj 
The first five are for “time" index zero whereas AT, and Aj are for index I, that is. 




6.7 SEQUENTIAL ESTIMATION 


279 


A'l 1 and The memory registers can be assigned as follows; 

012 3 456789 

bl b 2 Pu P \2 P72 ^1 ^2 ^1 ^2 

Later in the calculations, register 5 can be used for A and then e/ A. 
A set of equations and storage locations are as follows: 


A^ — XlP ,1 -H A'2P 12 

STO 8 

A 2 = XjP 12 + X2P22 

ST 0 9 

A = A2X2 + A lA^i + 0^ 

STO 5 

A} 

Pn=--^ + Pn 

STO 2 

A lA-y 

P12 ^ ^12 

STO 3 

A^ 

P 21 - ^ + -^22 

STO 4 

. Y-X,b,-X-,b 2 

A A 

STO 5 

bi = Aij+bi 

STOO 

bi — Ai-^ T 62 

STO 1 


The i subscript has been dropped but it is implied; for example, in the Pj, 
equation, Pi, on the right is at time / whereas P,, on the left is at time /+ 1. The 
above set of equations are used for each value of i. A special storage location for Y 
is not necessary because it is read in and used as needed. 


Example 6.7.2 

Using sequential ordinary least squares, estimate the two parameters for observa- 
tions Y and sensitivity matrix X given by 



Let bo-0 and ?□= 10^1, 10'°I, and lO'H. (These large P,, values simulate no prior 
information.) 



280 CIUPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTINUTION 


Solution 

Since OLS is to be used = 1 /or t •= 1,2,3 The equations to be used are given in 
Example 67 1 

A, ,>=4(10’) + 3(0) = 4xl0*, ,4j,=4(0) + 3(10’)=3xl0' 

A, = 3xlO^(3) + 4xlO*(4)+ 1=2 500001 

The rest of the calculations foi the Iwst tune ate given in the third column of Table 
6 10 along with results for times 2 and 3 The calculations were performed using a 
Texas Instruments SR‘56 programmable calculator which has a 12 digit accuracy 
For this problem the calculator accuracy can be important since subtractions of 
nearly identical large values occur while calculating the values The parame- 
ters are only slightly affected for a large range of P<, mafncei such as from 10^ to 
10'® for this example If the P(, matrix is Kl where K > 10'^ however, the P/ 
matrices are 0 for i > 2 and the parameters do not change after bj 


Table 6.10 Results for Example 6.7.2 Using T1 SR-S6 
Programmable Calculator 


Quantity 

Exact Values 

Po-IO’l 

Po-lO'®! 

Po- lo’^r 


— 

36000 0256 

36X10’ 

3.6X10'* 

ftxi 

-- 

-479999808 

-4 8X10’ 

-48X10'* 


— 

64000144 

6 4X10’ 

64X10'* 

6it 

— 

I 759999296 

176 

176 

bi^ 

— 

1 31999472 

132 

132 

/’ll! 

52 

051999427 

0 542 

0 

^12 S 

- 56 

- 55999339 

-0572 

0 

^22 2 

68 

067999228 

0679 

0 


2 

19999952 

20 

20 


1 

10000044 

10 

10 

^It 3 

0 0131826742 

00131826666 

001311537163 

0 

Pl2 3 

-00225988701 

-00225988336 

-002206035 

0 


0 1101694915 

01101692781 

0107167128 

0 

*13 

2 436911488 

243691129 

2 436373456 

20 

*2 3 

0 5367231638 

05367231198 

0 5462544 

10 


Physically P**© implies that the variance of the parameters is zero and thus 
nothing more can be learned Irom additioiial data if P=0 hence the parameters 
do not change with time for Po“ 10”l after the second data point However P is 
effectively zero in this example only because of our method and the limited 
accuracy of the calculator Hence thou^ Po can be selected from a large range of 
values to simulate no pnor informaMon rt can be made too large 



6.7 SEQUENTIAL ESTIMATION 


281 


Table 6.11 is given to illustrate the relative errors in the parameters at the third 
data point for different values of Pq. The large values of Po= 10^1 to 10®I lead to 
accurate estimates. Small and large values of K in Po=^I can lead, however, to 
relatively inaccurate parameter values. Small values imply prior parameter esti- 
mates are accurately known, which is not compatible with OLS estimation. Small 
or large values of K should be compared with the values of the square of the 
parameters. In the present case the parameters are about unity so that AT < 1 is 
termed “small” and A:> 10^ may be termed large. Another indication that K is 
chosen sufficiently large is that K is large compared with the largest diagonal term 
of P; for i > 2 (for two parameters). 


Table 6.11 

Relative Errors in 6, 3 and 62,3 

for Example 6.7.2 


Relative 

Errors in 

/:inPo=A:i 

^1,3 

^ 2,3 

1 

-8.14X10"° 

-7.56X10"° 

10^ 

-8.24X10"° 

-7.56X10"° 

10^ 

-8.13X10"° 

-8.20X10"° 

10’ 

4.58X10"’ 

-2.16X10"° 

10’ 

1.35X10"° 

5.40X10"'* 

10'° 

-2.21X10"'' 

0.0178 

10" 

-6.02X10"° 

0.348 

10*’ 

1.94X10"° 

-0.148 

10'° 

-0.1792 

0.863 


It can be shown that in Po= K\ is too large for the two-parameter case when 
a^/Ai=10“''^ and is greater than the number of significant figures-used by the 
computer or calculator. It is not difficult to show that 


Ol Oj 

Let a\/ A, be equal to or greater than lO”"*-, being the number of significant 
calculational digits. Also let K= 10"* where K is large. Then for K not too large, we 
should have 


logi — • ^ 1 (6.7.10) 

Using the values for the above example and rtc= 12, we find n/. < 10.6. In other 
words, K should be less that 10'° ® in order not to be too large. This is consistent 
with the results of Tables 6.10 and 6.11. To be not near the critical number of 
significant figures, four less are recommended, that is, Po= 10®I in this case. 



7S2 CHAPTER 6 MATRIX ANALYSIS FOR UNEAR PARAMETER ESTIMATION 


6.7J.2 Sequential Analysis of Example 5.2.4 
Computer programs can be readily wntlen based on (6 7 8) One advan- 
tage IS that no separate method is needed for the solution of a set of 
simultaneous algebraic equations Moreover, the procedure is readily mod 
ified for any number of parameters p For two parameters a small pro- 
grammable calculator can also be used 
A computer program was written to estimate, using sequential OLS, the 
parameters in the model ij, = /Jj + for the data of Example 524 
Ordinary least squares analysis implies that the variance aj is constant, 
prior parameter values are unknown, and the diagonal components of Pq 
must be large Since OLS analysis » unaffected by the choice of ej, replace 
a} by 1 for 1, ,9 For simplicity, let the imtial values of and be 

zero, as no prior information is given 
If rough estimates were available for the parameters, then the diagonal 
components of Pg could be chosen about 10' to 10* times larger If we do 
not have this informatton, then for this two-parameter case, (6 7 10) can be 
used, since A’, 1 and >*0, we find that Thus if a computer 

with 15 significant digit accuracy is available. Pg should have diagonal 
terms less than 10'* It would be safer to reduce Pg by four orders of 
magnitude, however, say to 10" Shown in Table 6 12 are those obtained 
using Pg" 10*1 but the parameters are identical to the seven decimal places 
given to those for Pg* /fl. 10‘< A’< 10". for a 15 significant digit com- 
puter Actually the values in Table 6 12 after >> I are exactly the same as 
those given by the usual least squares procedure if the data were first 
analyzed for the first two data points, then the first three, etc 
One way to check if Pg is made large enough is to repeat the calculation 
with Pg made larger This is not efficient, however Another way is to 
compare the diagonal components of Pg and P„ the matrix for all the data 


Table 6 12 Sequential Analysis of Example S 2 4 


1 

b\ 

62 



Pit 

0 

0 

0 

10* 

0 

10® 

I 

0 2580000 

00 

1 0000000 

00 

10’ 

2 

02580000 

0 17080000 

1 0000000 

-01000000 

0 0200000 

3 

0 1281667 

0 2097500 

08333333 

-00500000 

0 0050000 

4 

04197000 

0 1660200 

0 7000000 

-00300000 

00020000 

5 

0 8238000 

0 1256100 

06000000 

-00200000 

00010000 

>5, 

aqsAsim 

a tliSQf? 

05238095 

-aawiWT 

OCKJOStlA 

7 

07957500 

0 1253321 

04643857 

-00107143 

0 0003571 

8 

1 1195833 

0 1091405 

04166667 

-0 0083333 

0 0002381 

9 

1 2864667 

0 1019883 

03777778 

-0 0066667 

00001667 


6.7 SEQUENTIAL ESTIMATION 


283 


From (6.6.6b) we can write 

P„=[X^,^-'X + Po-’] 
which for the OLS analysis above becomes 


p„ = [x^x+/:-'i] 


1 + 


(X^X) 

K 


(X^X) 


(6.7.11) 


(6.7.12) 


Now as K^co, P^^fX^X)"'. Then as this condition is approached, the 
diagonal components of X^X, and hence those of P„, must be small 
compared to K. Consequently we can check if Po= is large enough by 
comparing the diagonal components of P„ with K. Note that in Table 6.12 
the P,, and Pji values for i > 2 are much less than Ar= 10^. 

Some further advantages are given below of the sequential method 
compared to the usual OLS analysis illustrated by Example 5.2.4. Each 
advantage relates to the ability of the sequential method to provide more 
information than is apparent from the usual OLS analysis. First, the effect 
of adding a single observation is apparent. For example, the effect of the 
fourth observation is to make much larger than if only the first three 
observations are used. Second, decreasing changes in the parameters with i 
show that each new observation tends to contribute less information than 
the previous one. See versus i in Table 6.12. 

Third, time variations of the parameters can yield insight into the 
accuracy of the measurements and/or adequacy of the regression function. 
For example, 6, seems to be increasing with the / index whereas ^2 is more 
constant. This increase in 6, could be due to inaccurate data or to actual 
time dependence of the parameter. (In this example we know that the 
former is the case because the regression function used is the correct one.) 
Owing to the larger variation of we suspect that the relative errors in the 
estimate are greater than in the estimate The possible time depen- 
dence of 6, could be further investigated by adding measurements or by 
repeating the analysis with a new set of data. If the increase in persists, 
then a change in the regression model would be indicated. 

Fourth, some conclusions can be drawn from the time variation of the 
parameters without any prior statistical knowledge of the measurement 
errors. If, however, there is statistical knowledge more can be learned. 

The sequential method also yields time variation of components of P. 
Note that P,, is decreasing much more slowly than Pjj. If the measure- 
ment errors are independent and have constant variance (or more precisely 
1111-01-), P,, is proportional to K(6i) and to V{b^. Hence the 



284 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


measurements for i>2 in this example are more effective in reducing 
errors in than m i>, 

The decrease in P,, and Pjj with i shown in Table 6 12 is necessary as 
indicated by the equations m Example 67 1 (because A^/A>0 and Aj/A 
>0) Physically this is reasonable because added measurements increase 
the available informatioov which results lo the diagonal components of P 
decreasing or at least not increasing 


674 Sequential MAP Estimation 

Another advantage of the sequential procedure is that the same procedure 
(and thus computer program) can be used for MAP estimation as well as 
for WLS, Gauss-Markov, ML, and OLS estimation provided certain 
standard assumptions ate valid Sets of assumptions permitting sequential 
analysis are Il--lin2 and 11-11113 The condition that the measurement 
errors be uncorrelated m time is particularly important 


Example 6 


An engineer has been given ihe task of measuring the ihermal conducuviiy k of a 
new electncal resistance heating wire A linear curve of k versus temperture T is 
needed Based on his expenence wnh similar alloys he feels that the model 
+ » reasonable with prior esiimaies of p, and /)j of W/m ‘C 

and ixj-OOl W/m-*C* where T is m ’C He estimates that Ihe covariance matrix 
for these values is 


-0002] 

10 ’ J 


and the pnor distribution is normal 

For the new alloy he obtained the following measucements The error m the 
temperature level can be neglected but the standard deviation in each measurement 
of k IS about 0 2 Also the errors are independent and normal 


Measured 

I TC'C) ValueoflrfW/m “O 

1 20 uTt 

2 20 10 94 

3 21 Ills 

4 too tl8S 

5 150 12 55 

6 200 13 18 

1 250 13 48 

8 297 13 90 

9 300 1454 

10 302 J4 36 



6.7 SEQUENTIAL ESTIMATION 


285 


Notice that the measurements tend to be concentrated at the extreme temperatures 
of 20 and 300°C. If there were no uncertainty in the adequacy of the linear in T 
model and if there were no prior information, the optimum design would consist of 
one-half of the measurements being at each extreme T. The experimenter com- 
promised by putting most of the measurements at the extremes but some inter- 
mediate values were included. 

Estimate sequentially the parameters in the model k = + PxT with and without 

the prior information. 

Solution 

This problem can be viewed as being one involving subjective prior information. 
The prior means of b] and ^2 ^re 12 and 0.01, respectively. The Pq elements are 
Pi, = 2, ?, 2 = -0.002, and Pii= 10"^. The a? values are 0.04. The algorithm given 
in Example 6.7.1 can be used to estimate the parameters with ^ 1 = 1 and X 2 being 
the T values. The results are given in Table 6.13. Notice that the first two 
observations (both of which are at T=20°C) yield estimates of bj and ^2 which are 
near the final values. The variance of b, which is given by Pu reduces consider- 
ably as a result of the first two observations. This is not true, however, for since 
P 22 decreases only slightly. This result is reasonable because bi represents the slope 
of k which, in the absence of prior information, requires measurements at two or 
more different T, values. With all the observations used, both Pn and P 22 have 
decreased considerably, indicating that the new measurements substantially re- 
duced the experimenter’s uncertainty. 

Table 6.14 gives typical results for sequential estimation with no prior informa- 
tion. Using (6.7.10) it is found that Po=10‘'l is large but not too large for 12 
significant figure accuracy. In contrast with the prior information case, the first two 
observations do not yield estimates that are reasonable. The reduction in the P 
matrix is negligible from the first to second observation. This is because both are at 


Table 6.13 Sequential Estimates Using Prior Information for 
Example 6.7.3 


0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 


b^ 

b2XlO^ 

P,,X10^ 

P,2X10= 

P 22 XIO’ 

12 

1 

200 

-200 

100 

11.271 

1.0669 

4.399 

-20.37 

83.50 

10.997 

1.0921 

2.386 

-18.52 

83.33 

10.971 

1.0934 

1.719 

-18.18 

83.32 

10.973 

.9591 

1.718 

- 17.56 

42.58 

10.961 

1.0228 

1.634 

-13.33 

21.20 

10.940 

1.0803 

1.513 

-9.916 

11.58 

10.960 

1.0409 

1.393 

-7.557 

6.930 

10.979 

1.0127 

1.290 

-5.977 

4.512 

10.933 

1.0813 

1.246 

-5.318 

3.521 

10.922 

1.0977 

1.222 

-4.953 

2.982 


286 CHAPTER 6 \UTRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


Table €.14 Sequential Elstimatcs for No Prior Information and 
Pg= 10^1 for Example 6.7J; o,'*=0 04,bg=0 


1 

*1 

biXlO’ 

P„XI0* 

FaXirf 

P-aX\0’ 

0 

0 

0 

10* 

0 

to" 

1 

0286 

57 207 

998X10* 

-4 99X10’ 

249X10* 

2 

0279 

55 885 

998X10* 

-4 99X10’ 

249X10* 

3 

12^74 

-5 349 

24759 

-122x10* 

5 99X10* 

4 

11018 

8318 

2361 

-33 82 

84 02 

5 

10 967 

1 0054 

1875 

-17 28 

27 78 

6 

10935 

1 0824 

1627 

-1127 

13 24 

7 

109S8 

1 0399 

I4S5 

-8 127 

7 475 

8 

10 977 

1 0113 

1328 

-6 258 

4 732 

9 

10928 

1 0833 

1276 

-5 510 

3 652 

10 

10916 

1 tool 

1247 

-5 104 

3 075 


Tm2Q^C Reasonable values of 6| and 6} appear only at I > 4 It is only at that 
7* changes to lOO’C after being near 7‘-J0*C for the first three observations Both 
b, and bj can be esiimated for the linear model* "di + ZiljT only if measurements 
are made at 2 or more T values Notice that componenu in Table 6 14 decrease 
in magnitude more rapidly than for Table 6 13 Hence as the number of observa 
tions increases the importance of the prior information dimirushes Prior infoima 
non always reduces parameter uncertainty however 


675 Mulilrcsponse Sequential Parameter Estimation 

When several (m > 1) dependent variables are measured at the same tune, 
it is sometimes possible to renumber them so that in effect ot = 1 This can 
be done if OLS estimation is being used or if ML and MAP estimation is 
used and the measurements arc independent ai each time as well as with 
time As an example consider the temperature data given in Table 7 )4 
where there are eight measurements made at each time Assume that a 
sequential OLS analysis is to be performed TTie temperature measure 
ments of this table can be described either by 

Yj(i) j^l.2, 8. 1 = 12, .n 

or 

Efc, *=1,2, ,8 9.10. .8n-2 8n- 1,8/1 

By using the latter numbering, the problem is changed from one with /n=8 
tom!=l See Problem 6 26 



6.7 SEQUENTIAL ESTIMATION 


287 


6.7.6 Ridge Regression Estimation 

Starting about 1960 A. E. Hoerl and R. W. Kennard [17-21] developed a 
procedure called ridge analysis, which is a graphical method for depicting 
the characteristics of second-order regression functions having many inde- 
pendent variables. To a related procedure Hoerl gave the name “ridge 
regression,” which he and Kennard have pointed out can have a general 
Bayesian interpretation. Hence the Bayesian estimation procedure which 
we call maximum a posteriori is related to ridge regression. 

For the standard assumptions implied by 1111—11, OLS estimation 
provides the estimator given by (6.2.5). This estimator among all linear 
unbiased estimators provides the minimum variance (for these assump- 
tions). The covariance matrix of b^s is given by (6.2.11). For convenience, 
Hoerl and Kennard scale the independent variables so that X^X has 
diagonal elements all equal to one. If the eigenvalues of X^X are denoted 
Xj ,7 = l,2,...,p, then a seriously “ill-conditioned” (relatively small |X^X|) 
problem is characterized by the smallest eigenvalue being very much 
smaller than unity. Hoerl and Kennard have noted that OLS estimation 
provides inadequate estimators for an ill-conditioned problem since 
cr^/Amin is a lower bound for the average squared distance between and 
p. Thus for such cases Bls is expected to be far from the true vector j3 with 
the absolute values of the elements of bLs being too large. 

The ridge regression estimator is given by 


b* = (X^X-l-XI) 'x’^Y 


(6.7.13) 


for X > 0. With X^X scaled to have unity diagonal terms, values of K in 
the range of 10“'’ to 1 are typical. There is an “optimum” value of K for 
any problem; Hoerl and Kennard discuss methods for selecting K. The 
MAP estimator given by (6.6.6c) yields the same est im ates as (6.7.13) if 
is replaced by 0, by I, and by Kl. This has the effect of 
introducing the subjective prior information that the mean parameter 
values are zero; then the estimates given by (6.7.13) have smaller absolute 
values as K becomes larger. Hence as indicated by Theorem 2 of 
Marquardt’s paper [21], (6.7.13) has the potential of reducing the inflated 
OLS parameter estimates found in ill-conditioned cases. Though the esti- 
mator given by (6.7.13) provides biased estimates, Hoerl and Kennard [17] 
have demonstrated that there exists a /OO such that £[(6* — /3)^(b* — /3)] 
<^[0>Ls-P)^(bLs-P)] provided /3^/3 is bounded. Evidently the MAP 
formulation has many different interpretations and uses. 



288 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


6.7.7 Comments and Conclusions on (be Sequential Estimation Method 

In this subsection some observations regarding the sequential method are 

given 

1. The method is genera] as it includes MAP, ML, Gauss-Markov, WLS, 
and OLS eslimatois When MAP and ML estimators are used it is 
necessary that the measurement errors e be additive, normal, and 
independent in “lime” designated by / If the observations are not 
independent in time, sometimes transformations can be made to ob- 
tain new dependent variables that are independent In Appendix 6A it 
is shown how certain autoregressive (AR) models for the observation 
errors can be treated by constructing independent combinations of the 
observations It is assumed that the statistical parameters of and p 
are known See also Section 6922 

If (1) is unknown for where fi is known and (2) there is 

no prior information, the sequential procedure with replaced by 1 
can be used to estimate the parameters After b^, is found, can be 
estimated using s^. the estimated covariance matrix would then be P„ 
times 

2. If the problem can be formulated so that there is only one independent 
observation at each i, only a scalar needs to be inverted regardless of 
how many parameters are present There are no simultaneous equa- 
tions to solve 

3. The method readily extends to more than one unknown parameter 
The summations in (6 7 8) can be easily programmed for an arbitrary 
value of p. the number of parameters in the b vector 

4. An examination of the parameters as a function of the index t can 
yield information that is not readily available otherwise First, the 
models are usually chosen to contain parameters that are constant with 
time If inspection of the parameters indicates that there is a time 
dependence (as in Table 6 12), then the adequacy of the model is 
questioned Second, one can obtain an immediate ‘‘feel” of the effect 
of an additional observation which does not depend upon any statisti- 
cal knowledge of the probability densities The change in the parame- 
ters becomes less as more observaUotis are used 

5. Good practice usually entails an inspection of the residuals For the 
linear parameter case the (rue residuals are not obtained directly m a 
sequential manner The residuals e based on the final parameter values 
^ ) are e=Y— Y=Y— Xb, These values are not the same as those 
calculated based on b, 

6. The sequential method provides at each i a filtered estimate of T, as 


6.8 MATRIX FORMULATION FOR CONFIDENCE INTERVALS 


289 


f.=X,b,. These are based on the data until time i and can be used in 
an on-line analysis. The sequential method given in this section can be 
related to the discrete Kalman filter [16, p. 159]. 

7. The sequential MAP estimator can also be interpreted as providing a 
ridge regression estimator which can be helpful when the data are 
ill-conditioned, that is, |X^X| is nearly zero. 


6.8 MATRIX FORMULATION FOR CONFIDENCE INTERVALS 
AND REGIONS 


Much more information can be conveyed regarding parameters by specify- 
ing confidence intervals or regions in addition to parameter estimates. 
Whenever possible and appropriate it is recommended that confidence 
regions be presented in addition to estimates. 

In order to present meaningful confidence regions it is necessary that the 
underlying assumptions be valid. Two assumptions frequently violated in 
scientific work are that the errors have zero mean and that the errors are 
uncorrelated. Erroneously taking these assumptions to be true has led 
many to present overly small confidence intervals. Physical parameters 
have been presented by different experimenters with each successive esti- 
mate being outside the preceding confidence interval. This has happened 
so often that one should be very careful to check his underlying assump- 
tions before presenting his results. Further discussion of assumptions is 
given in Sections 5.10-5.14 and Section 6.9. Presentation of confidence 
regions is recommended but must be carefully and honestly given. 

For additive, zero mean, normal measurement errors the joint probabil- 
ity density for the parameter vector b can be written as 


/(b) = (2,r)-"/^lV,|-'/^exp 




( 6 . 8 . 1 ) 


where is the covariance matrix of b. It is given by (6.2.1 1) for ordinary 
least squares, by (6.5.5) for maximum likelihood, and by (6.6.11) for MAP 
estimation. (The assumptions are different for each case.) For convenience 
in representing cov(b) [or cov(b-^) for MAP cases], let us use P for each 
case, 


^b = P=[P.j] ( 6 . 8 . 2 ) 

To obtain confidence regions, the covariance matrix of a, that is, ip, 
s ould be known at least within a multiplicative constant. Section 6.8.1 



290 CHAPTER 6 MATRIX ANALYSIS TOR UNEAR PARAMETER ESTIMATION 


gives confidence mten-als Secttoa 68 2 provides a derivation of a confi 
dence region provided ^ is completely known For the more general case 
of ^ = a^S2 where is unknown and Q known, a confidence region 
analysis is given in Section 683 

6 8,t Confidence Intervals 

A confidence interval can be found for each parameter through the use 
of the kth diagonal term in P and the t distribution (if il applies) See 
Theorem 6 2 3 Suppose that the measurement errors are additive, zero 
mean and normal Also let there be no errors in the independent variables 
and no prior information regarding the parameters Also let ^ where 
IS unknown and 0 known These assumptions arc designated 11— lOU 
Suppose that OLS or ML has been used and was replaced by any 
constant c* and the matrix P was calculated, for example, for ML it is 

p-c’(X''n 'X)'' cfO 

The tilde (') is used because is unknown With the above assumptions, 
for OLS OT ML we can give the estimated standard error of 6^ as 

esl (683) 

where is the estimated value of o* Then the 100(1 -«)% confidence 
interval is given by 

6*-est se(b*)i, «/:(n“P)<ft<l>*+'est 5e(b*)l,.„/2(n-p) (684) 

where •he r statistic for n~ p degrees of freedom For 95% 

confidence and n~p=l0. we find in Table 2 15, rg7j(10) = 223 For a 
nonlinear example, see Example 7 7 2 
U should be noted that there is considerable danger in constructing 
confidence regions from confidence intervals found as above since such a 
procedure can yield highly inaccurate confidence regions 

6 82 Confidence Regions for Known if 

In this section ^ and thus P are assumed to be known Consider the matrix 
produ<n in "ftie exponerfi. tA ^(T(i scv A eqaa'i lu vtnce rt is n. 

nonnegative scalar. 


(b-pfP '(b-a)= 


(68 5) 



6.8 MATRIX FORMULATION FOR CONFIDENCE INTERVALS 


291 


Let us think of the hyperellipsoid as being centered at the origin with 
coordinates being b^- fi^. Let be some specific value. For 
(6.8.5) represents the interior of a hyperellipsoid. At r=l, (6.8.5) 
produces hypersurfaces of constant probability density. A method for 
determining values for / is derived below. Although (6.8.5) with r=l 
describes the ellipsoid, for many purposes a more convenient description 
can be given in terms of the directions and lengths of the axes of the 
ellipsoid. To transform (6.8.5) in such a way as to provide such a descrip- 
tion, we first find the eigenvalues of P or P“’ since the eigenvalues of P 
are simply the reciprocals of those of P“'. 

For convenience in the following derivation, let P“‘ be designated C 
whose eigenvalues are found by solving the determinantal equation 


^11 ^ ^12 

^12 C 22 — X 



( 6 . 8 . 6 ) 


^Ip • • • Cpp ^ 


The eigenvalues of C are designated A,,A 2 ,...,Ap. For convenience in 
numbering the A,’s, let A, < A, for / < j. 

Let C = P“' and let C be given by 

C=eAe^ (6.8.7) 

where 


A = diag[A,A 2 - • • A^] 
and where e is a. pXp matrix, 



^11 

^12 




^21 

^22 


^2p 

e = 


• 


: 



^P2 


^pp 


( 6 . 8 . 8 ) 


(6.8.9) 


The vector components of e are orthogonal and of unit length (i.e., 
orthonormal). 


2 ^ 0 .= 1 


e^e = I 


or 


(6.8.I0a,b) 



292 CHAPTER 6 MATRIX ANALYSIS FOR UNEAR PARAMETER ESTIMATION 


which implies that 

e^=e"' (6 8J0c) 

Postmultiplying (6 8 7) by c gives 

Ce=eAc^e=eX (6 8 11) 

Introducing the vector components of e into (6 8 11) yields 

Ce.=\e. (6 8 12) 

which can he considered lo comprise a set of p linear homogeneous 
equations with the unknowns e|,.e 2 ,, For example, forp = 3 we have 

(C|i-\K + C,je 2 , + C, 3 ej, = 0 (68 I3a) 

Cn«„ + (C„-\)ei, + Cjje3 -0 (68 I3b) 

Cij«i. + Cijei, + (Cjj-\)ej,»0 (68 13c) 

which constitutes three equations with three unknowns, but since these 
equations are homogeneous, there are at most two independent equations 
A third equation is found from (6 8 lOb), 

W. + 4 + 4-> (68 14) 

Then e,,. cj,, and ej, would usually be found from a solution of (6 8 13a, b) 
and (6 8 14) 

A new coordinate vector can be defined by 

h3e^(b-P) or k,‘^ef(b~p) (68 15) 

Then introducing (6 8 7) and (6 8 15) in (6 8 5) produces 

r' = (b-/3)W^(b-p)=h'‘Ah>= 2 A, A," (6 8 16) 

Using the further translormation 

(6 8 17) 

we can write 

(68 18) 



6.8 MATRIX FORMULATION FOR CONFIDENCE INTERVALS 


293 


The probability of a point 2 (or b-j8) lying inside the hypersphere 
r^, where / is some fixed value, is found by using the above transfor- 
mation in (6.8.1) to obtain 





c/^ j fife 2 * * * 


(6.8.19) 


The integration is performed over the interior of the hypersphere described 
by (6.8.18). Note that 


( 6 . 8 . 20 ) 


is used in deriving (6.8.19). A volume element inside the hyperellipse can 
be described by 


p-pP/'^r^ ~ ' dr 

( 6 . 8 . 2 .) 


where r(-) is the gamma function. The using (6.8.21) in (6.8.19) results in 





( 6 . 8 . 22 ) 


(If the transformation r^ — x is used, (6.8.22) is transformed to a form 
which is the integral of the chi-squared probability density function with p 
degrees of freedom.) For p = 1, (6.8.22) gives 

P(,-2</2) = ^lj ' J^'exp(-yjcfr = erf(/2-'/2) (6.8.23a) 

and forp = 2 

P (^2 ^ /2 ) = j^'exp| - y jr rf/- = 1 - exp^ - y j (6.8.23b) 

These probabilities for /=1, 2, and 3 are given in Table 6.15 for several 
values of the number of parameters p. These three / values are sometimes 
called the one-, two-, or three-sigma probabilities. Also given in Table 6.15 
are the / values associated with the 90 and 95% confidence regions. Other 



294 CHAPTER 6 MATRIX ANALYSIS FOR LIMIAR PARAMETER ESTIMATION 


Table 6 IS Values of Confidence R<^on Probabililies 
for Various Numbers of Parameters 


No of 

parameters 

Probabibiy for 

/Value 

/•=J 

2-2 

/•=3 

For 90% 
Confidence 

For 95% 
Confidence 

J 

683 

955 

997 

1645 

I960 

2 

3935 

8647 

9889 

2 146 

2 447 

3 

200 

739 

971 

2 500 

2 795 

4 

0902 

594 

939 

2 781 

3 080 

6 

0144 

305 

9264 

3 263 

3 548 

8 

00173 

143 

657 

3 655 

3 938 

10 

00172 

0538 

467 

3 998 

4 279 


values of /, ,(/>) can be obtained using the For tables since 

h ..^P)-[pF,.,^p ( 6824 ) 


In summary the confidence region for known P (and thus is the 
interior of the hyperellipsoid 


(b-p)^P 


(68 25) 


where li^g(p) IS the / value associated with the 100(1 -o) % confidence 
region for p parameters in the model This region is more conveniemiy 
described by locating the principal axes of (he hyperellipsoid The extremes 
of the axes in terms of the new coordinates h, are given by (6 8 15) 
The maximum cooidmate values along the new axes are given by 


/ii= ± Aj*=0, Aj=0 (majoraxis) 

/i, = 0, /ij=*i/,_.(p)Aj--''^ Aj=0 A^-0 


A,=0. /ij^O *j=0, (6826) 

where (68 16) IS used with ,(p) The ft, values depend on e, which 

IS found as suggested by <6 8 I3a, b) and (6 8 N) We wish to relate these 
values to points in the b— p coordinates Equation 6 8 26 can be written in 




6.8 MATRIX FORMULATION FOR CONFIDENCE INTERVALS 


295 


the matrix form 

H=±/,_„(;7)diag[Xr'/^X2-'/^ ••• (6-8-2V) 

where B = [B] Bj • • • B^] and B,. contains the coordinates of b-/3 for the 
;th axis. Solving for B using (6.8.10c) gives 

B- ± ;,_.(p)e<liag[X,- • ■ V‘'"] (6-S-2S) 

Example 6.8.1 

Consider the case of two parameters. Give an algebraic formulation of the 
confidence region for /3 using C = P“'. 

Solution 

Since C is given by 



we can expand (6.8.25) to 

(b,-/3, fC, ,+2(b,-/3, )(b2-^2 )Cxi + fC22 = ll-a (2) (fl) 

However, we need new coordinates A, and hj so that (a) can be written 

/f_„=X,/,?+X2/>l {b) 

The eigenvalues X] and X 2 are found using (6.8.6), 

(C„-X)(C22-X)-Cf2 = X 2 -X(C„ + C22) + (CHC22-C?2) = 0 (c) 

which can be solved for X| (being the smaller) and X 2 , 

X. = { C„ + C 22 - [(C„ + C22)'-4C„C22 + 4Cf2]'^')/2 id) 

X2={c„ + C22+[(C„ + C22)'-4C„C22 + 4C?2]'^'}/2 (e) 

The coordinates h] and /i 2 are given by (6.8.15), 

/!| = e,,(h,-/?, ) + e2i(h2-^2)> ^'2= ^12(^1 -^i )+ e22(*2~j82) (/) 

The e,j components are found using (6.8.10) and (6.8.12) or for e,i and ^ 2 ,, 

(C|i — Ai)en + C|2e2i = 0 

en + e|,= l 



296 CHAPTER 6 NUTRK ANALYSIS FOR UNEAR PARAMETER ESHMATION 
which yields 


^11 = 


C?2 


(C„-X,V 


Similarly for and ej2 hsve 


C|J «! 2 + ( ^21 ~ ^2 )e2J = 0 


^2+^22=* 

which has a solution of 


ch 




ig) 

(h) 


Uj) 


For two parameter cases u can be shown that ei2**«2i <Ji" «a 

Symmetry can not generally be arranged fofp>2 
The end points of the axes are given by (6 8 28) 

[ (b-li )„ (b-^ .(2|[‘" '“If*''''' “1 (*) 

since eiie22“ei2e2i“ ± I This equation means for example 

.a)e..>*i (*>2-^2U-i/, .(2)e2,\, U) 

i^|-'^|)m»"'*A-.(2)e,2X2 <*2 .(2)e22A2'^* (m) 


Example 6.S 2 

Find the 95% confidence region for i=2 and r=9 of Example 5 24 


Solution 

Consider the (==2 case first The C=F ' matrix for OLS estimation with the 
standard assumptions IS Since we have 


SX 


XX 

xxf 


2 10 
10 100 



6.8 MATRIX FORMULATION FOR CONFIDENCE INTERVALS 


297 


From {d) and (e) of Example 6.8.1 we find X, = 0.990001 and X2= 101.009999. The 
components of e are found from (g), (/i), (/), and (7) to be 

^ - 0.9949382 + 0. 1 004886 ' 

® [+0.1004886 +0.9949382 

From (k) the end points of the ellipse are given by 

(*1-^1 )^aj= ±(2.447)(-0.9949382)(0.990001)~'''^= +2.447 

(62-/82 )maj= ±2.447(0.1004886)(0.990001)“'''"= ±0.2471 

(61-/3, )™„ = ± 2.447(0. 1004886)(10 1 .009999) ” = ± 0.02447 

(62-y32)™„= ±2.447(0.9949382)(101.009999)”‘^^= ±0.2422 

These are the end points of the ellipse 

/?_„(2) = 2.447^=X,/,f + X2/,| 

. =0.990001/1?+ 101.009999X1 

which is shown as the larger ellipse in Fig. 6.5. 

For / = 9, the C matrix is 

9 360 ■ 

360 20400 

for which X, = 2.646235 and X2 = 20406.35377. The e matrix is 

r -0.99984429 0.0176466' 

[ 0.0176466 0.99984429 

and the end points of the confidence region are 

(6,-/3l)„.aj= + 1.504, (62-i82)„,a;= ±0.0265 

(6, -/8, )„,.„= ±3.02X10-'*, (62- 182 )„„„= ±0.0171 

This curve is also shown in Fig. 6.5; it is very narrow indicating less uncertainty in 
than for X,. 

The estimated values of b^ and X2 using the nine observations in Table 6.12 are 
1.286 and 0.10199, respectively, and thus X, — ,8, = 0.286 and Xj — ,82 = 0.0199. (The 
true values of ,8, and P 2 ^re 1 and 0.1.) This value is outside the 95% confidence 
region for i = 9 because the value given by (6.8.5), 

(b - 0 ) V(b - P ) = (X, - / 3 , )"c„ + 2 (X, - ,8, )(X2- ,82 )C,2+ (X2- /82 )V22 

is equal to 12.9, which is greater than /?_„(2) = 5.99. One can observe directly from 
the plot in Fig. 6.5 that the estimates X, 2 and X2,2 are inside the / = 2 confidence 
region, (See the point X at b^ — = —0.75 in the figure.) 




figure 6 5 CoafKience regions for Enample 6 


6.8 MATRIX FORMULATION FOR CONFIDENCE INTERVALS 


299 


6.8.3 Confidence Regions for }p = a^S2 with Known and Unknown 

In the previous subsection we considered the case of known In this 
section Tp is equal to times where is unknown and S2 is known. This 
case is analyzed by developing an F statistic, which is the ratio of two 
statistics each divided by their respective degrees of freedom. The first x^ 
statistic is (b- j3)^P~'(b-/3), which has p degrees of freedom. The 
assumptions are designated 11-1011 which include zero mean, normal 
measurement errors. In this section we assume that there is no prior 
information regarding the parameters. Maximum likelihood estimation is 
assumed. 

The second statistic is 

= (6.8.29) 

which we wish to show is a statistic with n — p degrees of freedom. As 
usual, i|'=cov(£). Notice that ordinary least squares analysis is not per- 
mitted unless fi = I for which case ^ml- general, is not diagonal. 
It could result from autoregressive observation errors, for example. 

We shall transform ^ sum of squares using the identity 

fi = Da>D^ (6.8.30) 

where D and are nXn matrices and is diagonal. Note that* 

n“' = (D-’)^0-'D-' = [a>-'/2D-']^[a>-'/2D->] (6.8.31) 

and thus can be written 


^ML=(Y-YML)''n-‘(Y-YM0a-2 



(6.8.32) 

F=$-'/2d-1y, 

(6.8.33) 


Notice that (6.8.32) is a sum of squares. For the linear-in-the-parameters 

*For this case the square root of the diagonal matrix O is a diagonal matrix having elements 
that are the positive square roots of corresponding elements in 



300 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


model Tj “Xp, (6 8 32) can be wntten 

= (F^F-bi|iLZ^F)<T”\ (6 834) 

■which has a distribution with n—p degrees of freedom See Theorem 
6 2 1 Observe that the covanance matnx of F is 


cov(F) = £^<b- '/^D- 'te ^(D- ' j = o*I 

For convenience, define R* and (/••)'* to be 

R * s (F- (F - iIml) 8 35a) 

(p-)-‘-p-»a* (6835b) 

Since F in (6 § 3Sa) is a random vector satisfying the first five standard 
assumptions Z is known, and (here is no prior information, (6 6 35a) is 
analogous to (6 2 19) which can be used to get 

(6 8 36) 

In order to be consistent let (P*) ' be found for ML estimation or 

(P*) '=X’‘ft-‘X (6 8 37) 


Now the F statistic » the ratio of two independent random variables, 
each with a divided-by-dcgrecs-of-fteedom distribution (see Section 
2 8 10) Hence a joint confidence region for the parameter estimates can be 
found from 


(bML-P)’'(P*)~'('>ML-II) /P 

R*/(/i-p) 


a{P’»~P) 


(68 38) 


or 

(6 8 39) 


In denying (6 8 39) we have assumed that ^ is equal to where O is 




301 


6.9 MATRIX ANALYSIS WITH CORRELATED OBSERVATION ERRORS 
known. When (6.8.39) reduces to the more common expression, 

(b^s- ^ )^X^X(bLs - P (Z^. « -P) 


since then b^jL—l^Ls- , •. , 

When dynamic experiments are performed and automatic diptal data 

acquisition equipment is used, n is usually quite large possi y severa 

hundred or even thousands. In such cases and 

pF,_„(p,n-p)^/LAp) 

and then 

For example, for a = 0.05, corresponding to the 95% confidence ^ 

for two parameters (p = 2), the values of p) . , • > 

6.04, and 6.01 for «= 100, 200, 400, and 1000, respectively. 


6.9 MATRIX ANALYSIS WITH CORRELATED OBSERVATION ERRORS 


6.9.1 Introduction 

When automatic digital data acquisition equipment is used for ^ynmnic 
experiments, very large numbers of measurements can be obtame . ese 
observations may not be independent, however. Correlated measurements 
are frequently obtained when the same specimen is tested over some range. 
An example is the measurement of the electrical resistance of a juece o 
wire as a function of temperature such as at 20, 21, and . e 
measurements at 20 and 21 °C may be correlated for a given specimen, ut 
a 20°C value for one specimen probably would not be with a 21 C va ue 
for another specimen. 

Another example of correlated errors is provided by the case ot a 
cooling billet given by Example 6.2.3. Some of the temperatures recorded 
are depicted in Fig. 6.1 along with the associated residuals. These data are 
shown for 96-sec time steps, but observations were actually made at the 
smaller steps of 12 sec. If all the data between 0 and 1536 sec are used, the 
regression curve for the 129 observations is very close to that obtained 
using 17 observations. For the fourth-degree model given in Example 6.2.3, 
residuals in F° are plotted in Fig. 6.6 for the first 25 data points for A/= 12 
sec. On the upper side of the horizontal axis is a scale that corresponds to 
Fig. 6.1. The solid circles show the residuals. There are 11 consecutive 
negative residuals, followed by 14 positive residuals. 



use right scale 



No. of observation with at = 12 seconds 



6.9 MATRIX ANALYSIS WITH CORRELATED OBSERVATION ERRORS 


303 


Residuals are not precisely independent even if the observation errors 
are, but for a large number of independent observations, the residuals are 
nearly independent. This is far from true for the residuals shown in Fig. 
6.6; rather, the residuals are highly correlated. A least squares analysis of 
the measurements containmg highly correlated errors may produce satis- 
factory parameter estimates. There are, however, at least two different 
dangers m the OLS analysis of highly correlated errors (i.e., tp having 
relatively large off-diagonal terms). One of these is that one might present 
too small confidence regions based on the erroneous assumption of inde- 
pendent errors. Another danger relates to this point and also experimental 
design when observations are independent and unbiased; doubling the 
number of them significantly improves the accuracy of the parameter 
estimates. This may not be true if the additional measurements result m 
higher correlation between the measurements as m the billet example 
illustrated by Fig. 6.1 when the time step is halved and the number of 
measurements is doubled. In such cases one might erroneously design the 
experiment to have too small a time step. 

A simple check to see if the residuals are approximately independent is 
based on the number of runs. (The number of runs is the number of 
changes m the signs of the residuals plus one. For example, for 

+ - + + ^ , there are four runs.) For n independent, zero mean 

random variables the expected number of runs is («+l)/2. For signifi- 
cance tests based on runs, see references 4 and 22. For cumulative errors 
(see below) the expected number of runs is about The residuals of 
Fig 6.1 exhibit six runs compared with (« -)- 1)/2 = 9. There seems to be no 
reason to question the independence of residuals. If there are still about six 
runs when the number of observations is doubled, we should question the 
independence. Certainly the residuals shown by the solid circles in Fig. 6.6 
cannot be considered independent with only two runs while («-M)/2= 13. 
Also shown m Fig. 6.6 as crosses are first differences of the residuals. 
There are 15 runs out of 24 points, indicating that the first differences are 
much closer to being independent than the residuals themselves. 

6.9.2 Autoregressive Errors (AR) 

In this section a first-order, single response, autoregressive model of errors 
IS considered. (A more general analysis is given m Appendix 6A.) Let e„ 
the ith measurement error, be described by 

E, = pe,_,+ M, 

£o=0, £'(i<,) = 0 

E{u,Uj) = Q for E{uf) = af 


(6.9.1a) 

(6.9.1b) 

(6.9.1c) 



3M CHAPTER 6 NtATRK ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


for/=l,2, ,n In words, (69 la) states lhat Ihe present error IS a fraction 
p of the previous error plus a zero mean component h, that is indepen- 
dently distributed This is called a first order autoregressive process 
It IS convenient to relate the nX 1 vectors t and u by 

e»Du (692) 

where D can be found from (69 la) to be the lower triangular matrix, 



Also from (6 91a) we find (see (6A 5a)l the inverse of D to be 


( 693 ) 



which contains a mam diagonal of ones and a diagonal just below of — p 
The covariance matrix of the errors «{'. is given by 

(6 9 5a) 

where (6 I 40) is used and where ^ is defined to be the diagonal matrix 



^=diag[ojo| (695b) 

Several classes of ^ matrices can be generated For p=0 and = a 
constant, the “standard” covanance matrix, is found, 

(6 9 6) 

Next, if p = 0, we have which is 

^^ = ^=diag[afoJ (7„^] (6 9 7) 




6.9 MATRIX ANALYSIS WITH CORRELATED OBSERVATION ERRORS 


305 


Also, several special autoregressive (AR) cases have error variances as 
follows 


a?=ca3, 


2 2 
of = of 


for / = 2,3, 


(6.9.8) 


for whieh the ij/ matrix, designated becomes 


c cp cp^ 

1 + cp^ p(l + cp^) 


cp- 

p2(l + cp^) • 

(l+p^+cp'‘) p(l+p^+cp‘') •• 


symmetric 


.n— 1 


cp- 

p^-^O + cp^) 
p"~^(l+p^+cp‘*) 


1 +p^ + p‘’- • ■ 

(6.9.9) 


See (6A.18) of Appendix 6 A for a derivation. Three special cases 
associated with (6.9.9) are as follows: 

1 . ;;-,,hasc = (l-p2)-'; (6A.22a). 

2. has c = 1 and p = 1 ; (6A.22b). 

3. )|/^3 has c = 1 ; (6A.22c). 

The matrix is the most common of the three special cases. It might 
be called a “steady-state” case because the diagonal terms are all equal. 




1-p^ 


'<1 


Notice that as p^l, the of values become much larger than a^. Physically, 
'I'ai is appropriate for some process which has been going on a “long” time 
before the taking of measurements. The other extreme physical situation is 
for measurements starting when the process starts; this is better described 
by case 3, In case 3, the variance of ef is a minimum at / = 1 (provided 
p>0) and gradually increases to the steady-state value of just given 
above. 

Case 2 has a simple ip matrix as given by (6A.22b). Notice, however, that 
the /th diagonal term is equal to ia^. Sometimes this case is considered to 
be unstable because the variance continually increases. We can use it 
however, with p = 1 (although case 1 can not have p= 1). For case 2 e,. is 



306 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 
found from (6 9 1) to be the sum 


-,= i 

y-i 

For this reason case 2 is called the cumulative error case The difference of 
the successive values e,_| and e, is which is independent from 

Figure 66 shows residuals and differences of residuals for the 
biUel example Since the residuals arc apparently correlated and the 
differences arc not, the cumulative error model for e, is better than that of 
independent t, (Estimation of p is discussed in Section 69 5) 

692-1 OLS Estimation With AR Errors 
Ordinary least squares can always be used irrespective of any of the 
standard assumptions In specifying the covariance matrix of the parame- 
ters or the confidence regions some conditions must be known (or 
assumed) Suppose that the conditions of additive zero mean, first order 
autoregressive errors are valid Also assume that ^ is known within a 
multiplicative constant These conditions are designated I1-2 0I1 
The covariance matrix of b^s is gwcn by (6 2 1 1) 

cov(bLs)-(X’‘X)’'x^*X{X^X) ' (6211) 

The terms in X*’X are given in a detailed form by (6 2 6) The X^^X 
portion of (6 2 1 1) is a little more difficult to evaluate Using (6 9 5a) we 
can write 

XVX»X'’D^D’'X«(D^X)Vd’'X (6 9 10) 

Consider now the D^X matnx product designated Z 



where the components Z,^ can be found using the expressions 
Z„=X„y for i«=n-I.n-2, ,1, ^ = 12, 


for r 



6.9 MATRIX ANALYSIS WITH CORRELATED OBSERVATION ERRORS 


307 


Notice that the Z,y values are found starting at the “bottom” of each 
column. Using this notation the X^4>X product has typical terms, as 
indicated by the Ik term below. 




X^i//X = 


2 Z,Z,,ar^ 


[/=i i 


(6.9.12) 


Unfortunately simple algebraic expressions for the covariance of bLs do 
not result from (6.2.4) for AR errors; the computer evaluation of the terms 
is straightforward, however. 


Example 6.9.1 

Derive an expression for K(6 ls) fo'" ^he simple model 7) = P and for 


Solution 

For this case, X^ = [l 1- ■ • 1] and X^X = n. The matrix Z is given by 


Z^ = 


n— ! 


2 p' ' 2 p' ' • • • 1 +p+p^ 1 +p 1 


1=1 


1=1 


and Z^^Z is 


= [l-pM-p''-' ••• l-pM-pM-p]y^ 


0-p")'— ^ + ”2 (1-p')' 


(1-p)^ 


(1-p) -=i 


The result for F„i( 6 ls), the variance of 6 ls I®*" can be written in the form 


i^«l(i’Ls) = 


« n(l-p)[ n(l-p)/ 


1 -p" 


(6.9.13) 


For any fixed value of p between 0 and 1 and increasing values of n, V„,(b, o) 
always decreases. 

Example 6.9.2 

Suppose p and the number of observations n are related by 


p=e-‘'/" 


(6.9.14) 


which gives greater correlation between observations for a fixed experimental range 
as the observations become more “dense.” In (6.9.14), a is some constant character- 
istic of the data. Using the result of Example 6.9.1, investigate V ,(6,0) for n 
Assume that a^d -p2)-. held constant. 


cc. 



308 CHAPTER 6 MATRIX ANALYSIS FOR UNEAR PARAMETER ESTIMATION 


Solution 

In (6913) oJ/(l-p^) IS replaced by w* and p by The result is an 

mdeterminant form After using 1 Hospital s role we obtain 

Jun (6915) 

See Fig 6 7 for a plot of (6 9 15) If the measurements become more correlated as 
n becomes larger in the manner descnbed by (6914) Va,(l>Ls) approaches a 
constant value for large « rather than going to zero as one obtains from (6 9 13) for 
p= constant and n-*i» 



Figure 6.^ Variance of b for /i—oe found for first order autoregressive errors using least 
squares and maaimum likelihood 

69 2 2 ML EsUmation IVtth AR Errors 

In maximum likelihood or Gauss Markov estimation the estimator for the 
linear model is given by (6 5 2) 

'X) (652) 

For ML this equation follows from the assumptions II 1 1 1 1 
It simplifies calculations in (652) if the relation is used 

1>ml=(x^(D 'D-'X) 'x^(D ')V 'D ‘Y (69 16a) 


(6 916b) 



6.9 MATRIX ANALYSIS WITH CORRELATED OBSERVATION ERRORS 


309 


where 

Z=D-% F=D-'Y (6.9.16c) 

Typical terms of X and F are given by 

= = for i = 2,3,...,n (6.9.17a) 

F,= y„ F = Y-pY,_^ for i = 2,3,...,n (6.9.17b) 

where D~' displayed by (6.9.4) is used. Note that by replacing 7, by F,, the 
modified observations F, are noncorrelated with / since from (6.9.1a), 
F, = 7 j, - pT?,_| + M, and the w,’s are noncorrelated. 

Another way to define the modified sensitivities and observations is by 
using 

zs^-'/2d-'x, 

which permits us to write 

, *7'*. “ * *7'* ilcT* 

where has a similar form as given for OLS. 

Results for Model tj = /? 

For AR errors for the simple model r] = p, the components of Z are given 
by 

Z, = l, Z 2 =Z 3 =--- =Z„ = l-p (6.9.20a) 

For case 1 AR errors, the Z, components for this same model are 

Z, = ar', Z2=Z3=---=Z„ = [(l-p)/(l+p)]'V‘ (6.9.20b) 

which are shown in Fig. 6.8 for p = 0, 0.5, 0.9, and 1.0. The net effect of p 
being between zero and one is to reduce the value of the modified 
sensitivity compared to A,. This results in the variance of b being greater 
than for p = 0. 

For the simple model the F, values are 

F,= yi, F 2 =y 2 -py„ Fj=Yj~pY^,- ■ ■ (6.9.21a) 

and the components of are 

<)„, = a2diag[(l-p2)“' 1 1 l] 


(6.9.18) 

1 

(6.9. 19a, b) 


(6.9.21b) 


3tO CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTLMA'nON 



Figure 6.8 Modified lenubviiy coeffictenis ford] errors end the inodel P 


Using (6 9 16b) with (6 9 20a) and (6921) gives for the simple model 
and al errors, 

Na : (6522) 

l-p’+(»-l)(l-p)‘ 

which has the variance 




i-p^+(«-i){i-p)^ i+(«-i){i-p)/(i+p) 


(6 9 23) 


which IS depicted m Fig 6 9 versus n for vanous p values For any fixed 
value of p between zero and unity always decreases with increas- 

ing « values Physically this represents the case of constant spacing of 
measurements but an increasing number of them See Fig. 6 U for exam- 
ple, which is for the billet problem with time steps of 96 sec If more 
measurements are added with the same spacing of 96 sec n would be 
increasing while p would be fixed, as in Fig 6 9 
Suppose that the observations become correlated as the time step is 
reduced and that p is related to n, tix maximum number of observations 
by p = exp(-fl/rt) Then for variable « and fixed a we can obtain Fig 
6 10 ffere as « becomes farge (about 20 for the case of a=i), faiC^MiJ 
approaches a constant value In this case increasing an already “large” 
value of n will not significantly improve the accuracy This case of 
correlated errors can occur as the tinve steps are made smaller and smaller 




6.9 MATRIX ANALYSIS WITH CORRELATED OBSERVATION ERRORS 


311 



in a dynamic experiment. For example, if in Example 6.2.3 Li were 
decreased from 96 sec to 48, then 24, and finally 12, while keeping the 
same total time of 1536 sec, the observation errors would become more 
and more correlated. This is demonstrated by the residuals shown in Fig. 
6 . 6 . 

The asymptotic values of Fig. 6.10 for n-^oo are given by 2ol/{a+2). 
(See Problem 5.26b.) Figure 6.7 depicts this relation as a function of a as 
well as the ratio of the variances for LS and ML estimators. The maximum 




312 CHAPTER « MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


ratio IS J 139, occurring al a=^n For this firsl-order autoregressive error 
model and simple physical model example, negligible improvement in 
accuracy occurs using maximum likelihood (or Gauss-Markov estimation) 
rather than ordinary least squares Other cases can be exhibited, however, 
for which ML estimators are far superior to OLS estimators 


6 Moving Average Errors (MA) 

A brief treatment is given below for first order moving average errors A 
model for first-order moving average errors is 


e, = M, — 1 

(6924) 

uo=0 £(u,)=0 

(692S) 


(6 926) 

In matnx form e can be written as 



(6 927) 

where 





whose inverse is 


I 

-6 

0 


0 


0 0 
1 0 
I 


0 0 


0 

0 

0 




(6 928) 


(6 929) 


Analogous to (6 9 5a) is given by 

^ = cov(c)=D„^D^, ^3£(uu^) (6930) 

In detennining the covariance of b^s as given by (6 2 1 1), we have the term 



6.9 MATRIX ANALYSIS WITH CORRELATED OBSERVATION ERRORS 


313 


X^X. By using (6.9.10) we see that the modified sensitivity given by 

(6.9.31) 

can be convenient to use. The m subscript denotes moving average. 
* 

Components of are 


\J,m 




for / = 2,3,...,n (6.9.32) 


and for j=\,1,...,p. When using ML estimation, we use the modified 
sensitivity matrix, 

Z.-D-'X (6.9.33) 

which has components 

= Z,^.. = 2r, + 0Z,_,^,„ for/ = 2,...,n (6.9.34) 

Notice that the AR ^ for LS analysis is similar to the Z for ML moving 
average errors, whereas the AR Z for ML analysis is similar to the LS MA 
analysis. 


6.9.4 Summary of First-Order Correlated Cases for the Model rj = p 

In Table 6.16 the variances of b are given for six different correlated 
errors; three of these are for autoregressive errors and the others are for 
moving average errors. The variances are given for both LS and ML 
estimators of Also the ratio of the variances for LS and ML estimation 
is given for large values of n, the number of observations. 

The following are some conclusions relative to correlated errors. 

1. If the measurement errors are actually correlated but are erroneously 
assumed to be independent, two deleterious effects can result. First, 
the experimenter might report much more accurate results than he 
should. Next, the experimental strategy might be based on an incorrect 
premise. That is, the accuracy of an estimate as measured by the 
variance always decreases as more independent measurements are 
included, but this is not necessarily true for correlated errors. Thus the 
experimenter might take 1000 observations expecting to achieve much 
more accuracy than if 100 were taken; it might be that the extra 900 
measurements are of dubious value for this purpose. 

2. For autoregressive errors with 0<p< 1, the variances of estimates can 
be much less than for independent errors (p = 0). 



E?!! ^siSq"aresV>n.nce I'lJ.O Ma..™,.™ i .i.t.r,. ^ „ >'(»Ls)/ >'(» 



314 


*«(n+l)(2ff + i) 



6.9 MATRIX ANALYSIS WITH CORRELATED OBSERVATION ERRORS 


315 


3. For some correlated error cases, the ML estimators can be greatly 
superior to those obtained using least squares. See the last column of 
Table 6.16. 

4. When the statistical parameters p and 9 are known, the estimation 
procedure for maximum likelihood or generalized least squares is not 
much more complicated for AR and MA errors than for independent 
errors. 

5. Since the covariance matrix of b for both ML and LS estimation can 
be readily given for AR and MA errors, the confidence intervals and 
regions for known p and 9 can be developed in exactly the same 
manner as in Section 6.8. 


6.9.5 Simultaneous Estimation of p, a^, and Physical Parameters for the a\ 
Cases 


When high speed digital data acquisition equipment is used, the measure- 
ments on one specimen are frequently correlated. Neither the best model 
to describe the correlations nor the statistical parameters such as p, 9, and 
al may be known. In this section the error model is assumed to be the 
first-order autoregressive designated a\. The parameters p and al are 
unknown, however. Maximum likelihood estimation is used because, un- 
like least squares, it can directly provide estimates of p and al. 

The analysis starts with the logarithm of the likelihood function as given 
by (6.1.67). For a single response case with the errors as described by 
(6.9.1), the i/' matrix is 

= (6.9.35) 

where D is given by (6.9.3) and by 


^<,i = '^«diagj(l-p2) ' ] 1 ... 

Then the determinant of is 

1 -p' 

Introducing (6.9.37) into (6.1. 67a) gives 


(6.9.36) 


(6.9.37) 


lnL = — — 
2 


n Inlvr-t-ln 




(6.9.38a) 


= S [(n-py;-,)-(T?,-p7),._.)f (6.9.38b) 


( = 2 


316 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


In the ML method we seek to minonue InL simultaneously with respect 
to the parameters /3, <j*, and p Necessary conditions for the minimum are 
that the first derivatives of InZ. with respect to these parameters be equal 

= 0 (6939a) 


-n , 


I-p* di ap 

Solving (6 9 39b) for d* yields 


.2 -^1 (*>MI.»P) 


(6 9 39b) 
(6 9 39c) 

(6940a) 


which could be used directly to obtain a (biased) estimate of ol if p were 
known U p is unknown, (6 9 39c) is employed to find 


p“ 2 '+2^/ 


^Y-Y, (6940b) 


where we have used (6 9 3Sb) to find 




(6941) 


The solution for b„L< P< ^nd 6* is nonlinear even though the model is linear 
in the parameters other than p and o* 

Various Iterative procedures can be suggested to find p $1 and One 
is to first guess a reasonable value of p (such as p»0) and then calculate 
b^i, using (6 9 39a) or equivalently (6 5 2) Nent, (6 9 40a) is used to get a 
value for d* which in turn is used in (6 9 40b) to obtain an improved value 
of p The procedure is repeated until p is essentially unchanged The 
converged values of p, d*, and b^i, are the desired values 


Example 6 9 J 

Give the set of iterative equations for the model ij»=^ and al autoregressive errors 


The estimator for ^ml ’* i'ven by (69 22a) with p replaced by p The estimated 
variance of U/ is found using (6 9 40a) 



6.9 MATRIX ANALYSIS WITH CORRELATED OBSERVATION ERRORS 


317 


The estimate for p is given by (6.9.40b) with Y,- replaced by b^h- These equations 
must be solved iteratively even for this simple problem. 

Example 6.9.4 

For the a3 autoregressive case investigate using the Monte Carlo method the 
estimates of bx, bj, p, and for the model rj,= lOO+O.lA',- where A',=0, 10, 20, ... 
and are chosen from random normal numbers of unit variance and zero mean. 
For p= 1, let n = 10, 30, 40, 50, and 120. For n = 60, let p = - 1, -0.5, 0, 0.5, and 1. 


Solution 

For this a3 case a solution for a given set of measurements is found in a similar 
manner as for the a\ case discussed above. The main difference is that the 
determinant of is al" rather than the expression given by (6.9.37). The result is 
the set of equations 

•Jml=(Z^Z)''z^F, Z=D-'X, F = D-'Y (a) 

2(e,-pe,_,)^ 

a^=— foreo=0 (6) 

(c) 

The diagonal matrix <j)^' is not needed in (a) because it is equal to and the 
term cancels. The Z matrix is equal to 



1 

1 

0 


-p 1 

1 

10 


-p 1 

1 

20 

II 

1 

Q 

-p 1 

1 

(n-l)lO 


1 0 

I-p lOp 

1-p 20-10p 


1-p 10[(n— I) — (n— 2)p] 


One array of measurements is generated by using y, = Tj. + e. where e; = pg,_j + M. 
and the a,, are found from a table of normal random numbers with unit variance 
and zero mean. The Monte Carlo analysis is obtained by generating a large number 



318 CHAPTER « MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


of sets of t>\, &!, p, and where each set corresponds to an array of measuremetils 
Table 6 17 is a summary of results of the Monte Carlo analysis for p = 1 with n = 10 
to 120 and Table 6 18 summarizes results for rt=*60 and p = — 1 to 1 For n = 10 to 
60 there were 34 sets of random data for each B, for /!= 120 17 sets were used (In 
general many more than even 34 sets of data should be used ) The terms with bars 
over them are average values found from the Monte Carlo simulation 


Table 6.17 Summary of Monte Carlo Investigation for 
p= 1 for Example 6.9.4 





n 



10 

30 

40 

50 

120 

p 

0 362 

0714 

0 783 

0 855 

0932 


0 803 

0888 

0 886 

0 928 

0953 


-000142 

-000047 

000001 

0 00039 

D 00058 

est s e /s e (bO 

0 837 

0 831 

0 837 

0 901 

0932 

(Si-AVft 

0 0663 

-000186 

00013 

00045 

-000003 

est s e (6j)/s e (ij) 

0426 

0327 

0316 

0 363 

0360 

est cov(6],f)2) 

-00091 

-00030 

-00023 

-0 0020 

-00009 

runs /n for D“'e 

0512 

0472 

0 486 

0481 

0491 


Table 6.18 Summary of Monte Carlo Investigation ttvT 
n = 60 for Example 6 9.4 


p 



-1 

-05 

0 

05 

1 

P 

-0 969 

-0485 

-0024 

0 430 

0881 


0958 

0966 

0966 

0 963 

0933 

(6i-W/0i 

000013 

000016 

0 00020 

000014 

000012 

est s e (6,)/s e (6,) 

1 024 

1029 

1 005 

0 938 

0912 


-000048 

-000060 

-0 00080 

-000093 

-0 00048 

est s e (62) /s e (62) 

0997 

1004 

0981 

0917 

0353 

est cov( 6,,62) 

-000004 

-000008 

-0 00016 

-0 00049 

-000175 

runs/n forD“*c 

0 523 

0S16 

0 500 

0493 

0490 




REFERENCES 


319 


Let us examine Table 6 17 The relative errors in p, a^, hj, and ^2 decrease as n is 
increased The first two estimates appear to be biased It is not clear whether 
and bi are, however, except for 62 with p=I and n=10 the biases indicated by 
and {b 2 -^^/ Pi are very small The average estimated standard error 
of b] divided by the true value is biased but not nearly as much as the same ratio 
for 62 which IS about 0 36 This means that confidence regions based on these 
estimated standard errors will be too small, or in other words, the parameter 
estimates would be presented as being more accurate than they really are. For p = 1 
in this example, the true value of the covariance of hj and 62 is zero while Table 
6 17 shows small negative values The average number of runs for the modified 
residuals, D“'(Y — Y), is nearly «/2, which is close to the(«+ l)/2 value expected 
for independent observations 

From Table 6 18 most of the same conclusions as drawn from Table 6 17 are 
valid An additional one is that only the cumulative error case (p=l) for 62 has a 
much smaller estimated standard error ratio than for the other p values listed For 
further discussion, see reference 23 


REFERENCES 

1 Hildebrand, F B , Methods of Applied Mathematics, 2nd ed , Prentice-Hall, Inc , En- 
glewood Cliffs, N J , 1965 

2 Deutsch, R , Estimation Theory, Prentice-Hall, Inc , Englewood Cliffs, N J , 1965 

3 Brownlee K A , Statistical Theory and Methodology m Science and Engineering, 2nd ed , 
John Wiley & Sons, Inc , New York, 1965 

4 Draper, N R and Smith, H , Applied Regression Analysis, John Wiley & Sons, Inc , New 
York, 1966 

5 Daniel, C and Wood, F S , Filling Equations to Data, Wiley-Interscience, New York, 
1971 

6 Box, G E P and Draper, N R , “The Bayesian Estimation of Common Parameters 
from Several Responses,” Biometrika 52 (1965), 355-365 

7 Hunter, W G , “Estimation of Unknown Constants from Multiresponse Data,” Ind 
Eng Chem 6 (1967), 461 

8 Welty, J R , Engineering Heat Transfer, John Wiley & Sons, Inc , New York, 1974 

9 Goldfeld, S M and Quandt, R E , Nonlinear Methods in Econometrics, North-Holland 
Publishing Company, Amsterdam, 1972 

10 Himmelblau, D M , Process Analysis by Statistical Methods, John Wiley & Sons, Inc , 
New York, 1970 

1 1 Burington R S and May, D C , Handbook of Probability and Statistics with Tables, 2nd 
ed , McGraw-Hill Book Company, New York, 1970 

12 Beyer, W H , Handbook of Tables for Probability and Statistics, The Chemical Rubber 
Co , 2nd ed , Cleveland, Ohio, 1968 

13 Rice J R, The Approximations of Functions, Vol 1 — Linear Theory, Addison-Wesiey 
Publishing Co , Reading, Mass 1964 

14 Myers R H , Response Surface Methodology, Allyn and Bacon, Inc , Boston, 1971 

15 Beck, J V and Al-Araji, S, “Investigation of a New Simple Transient Method of 



310 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


Thtnnal Properly Measurenvent,” / Heel Tron^ftr, Trans ^SM£, Ser C 96 (1914). 
59-64 

16 Mendel J M , Discrete TicAw^un ^ Paramtier EsJimarion The Egualion Error Formula 
non Marcel DeVkec. Inc . New Yoct, 1973 

17 Hoerl A E , “Applicalion of RvSee Aoa^s to Regression Problems,” Cftein Eng 
Progr 55(1962) 54-59 

18 Hoerl A E , “Ridge Ana^is,* CAem Eng Pmgr , Symposium Series Vot 60(1961) 
67 77 

!9 Hoerl, A. E andKennard R W .“Ridge Regresswii Biased Estimaiwn for Nononhog 
otul Problems ” r«Ajwme(e«s 12 (1970) 55-67 

20 Hoerl A E and Kennard. R W “Ridge Regression Appliciiions to Nonorihogoni! 
Problems’ Technometncs 12 (I970).69-S2 

21 Marquardi, D W , “Genetaliacd Inverses Ridge Regression, Biased Linear Estunaiioa. 
and Nonlinear Eslimatioft,” Tfchaomeines, 12 (J970) 591-012 

22 Swed E S and Eisenkart. C “Tablet for Testing Randomness of Grouping m a 
Sequence of Allernatives " ,4 b>i Marh Slat 14 ((94J> 66-57 

23 Beck J V Parameter Estimation wiih Cutnulauve Errors.' recAimmerncf 16 (1974). 
85 92 

24 US Bureau o( the Census Siaiittual AbttiMci of the United S’atet 1974 9StVi Armual 
Edition Washington D C 1974 


APPENDIX 6A AUTOREGRESSIVE MEASUREMENT ERRORS 

Consider the second -order model of auioregressite (AR) measurement 
errors, 

i + «, (^AJ) 

for which 

eo^e-i^O. H~N(Oo/) (6A2) 

£(«,a,)e=0 for i^j (6A3) 

Let us write out the first few terms of (6A 1). 


«2='P31«l + «‘2=P2|W|+«I 

h^Pi\fl+Pljt\ + Wj-PJlPjlWl + P3J«<I + P32W1 + «J 

= (P31P2I '*■ PJ2)«I +P 3 l«i + "3 

Then m general we can wnte the e vector in terms of the u vector as 

*«D^u (6A4a) 



appendix 6a autoregressive measurement errors 


321 


where 

£^=[£i£ 2 ---e„], (6A.4b) 

1 

P 21 ^ 

Dg= P3iP2l'hP32 P 31 1 (6A.4c) 

The lower triangular matrix becomes cumbersome as the dimensions 
nXn become large. The inverse of can be found relatively easily, 
however. Write out (6A.1) for m, as 

M, = ei 

^2~ ~ P2l^\'^ ^2 

W 3 = — P32^i — P3)£2 

or 

u = DJ'£ (6A.5a) 

where Dj ‘ is a square matrix with three nonzero diagonals, 

1 

-P 21 1 

~P32 ~P3l 1 

^> 0 "’= 0 -P 42 -P 41 1 (6A.5b) 

0 0 ••• -p„2 -p„, 1 

The covariance matrix of the errors can be written as 

4/ = £[ ££ ^] = £ (D„uu^D^) = D^4>D^ (6A.6a) 

where ^ is the diagonal matrix 

<> = diag[af af • • • ct^] 


(6A.6b) 



322 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


The inverse of il- is 

(‘A7) 

where the inverse of has the relatively simple form given by (6A 5b) 
Because of the form of which is lower tnangular with ones along the 
diagonal, the determinant of is unity Hence the determinant of ^ is 

{6A8) 

For maximum likelihood estimation we know that 

(6A9a) 

«'“(l>„t)-(XVX)''' (6A9b) 

for the linear model ij"X0 By defining 

X^-D;'X F.-D;'Y (6A10) 

The ML relations m (6A 9) can be written in (he weighted least squares 
form of 

(6Alla) 

i:ov(b„i)-(z;«.-'Z,) ' (6AUb) 

Because of the simple form of the inverse of Dg, the terms in and are 
= 2^ (6AI2a) 

^.a=>’.-ftiT, i-PaK 2 (6Al2b) 

Also because ^ is a diagonal matrix the matrices in (6A 1 1) can be written 
in summation form as 


'^11= I 2 T=I.2, ,p and /c=l,2, ,p 

(6A 13a) 

j-1,2, .p (6A13b) 



APPENDIX 6A AUTOREGRESSIVE MEASUREMENT ERRORS 


323 


For nonlinear estimation problems the sum of squares function is of 
interest. It is given by 

= (6A.14a) 

where 

= (6A.14b) 

Again because ^ is diagonal, we can write the relatively simple summation 

(6A.15a) 

(=1 


= S [(Y,-p„y,_i-A2Y,_2)-(i],-AiT),-i-ft2h;-2)]^o, 2 (6A.15b) 
1=1 


The above analysis is readily modified for first-order autoregressive 
cases by setting p ,2 equal to zero. Higher-order cases are also treated 
simply by adding terms in ^ and F, 


Covariance Matrices for First-Order Autoregressive Errors with Constant p 

Some covariance matrices of xp for the first-order AR case with constant p 
and 


af = cOu, o^ = a„ for i = 2,3,...,n (6A.16) 


are given below. For this case and are given by 


1 

P 



0 0 0 

1 0 0 

P 1 0 

P^ P 1 


0 

0 

0 

0 


^ = diag[ca2a^ ••• a^] 


P 


1 


(6A.17) 



324 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


and then i/- can be multiplied out to get 




c cp cp* 

1 + Cp^ p(l+tfp*) 

I+p^ + cp^ 


symmetric 


cp^ 

p^(l + rp=) 
p{l+p^ + fp*) 
1+p^ + p^+cp* 


(6 A 18) 


Consider the terms in the square matrix in (6A 18) Notice that the 
diagonal term of the yth row which we designate Cj also appears as a 
product in thcyth row for the 0+ J)lh s“«ceding terms Forp^^l Gj 
is given by 


C,»l+pU +p^'^ '>+(c-l)p^ "--i— 2^+(c-l)p^'^ ” (6A19) 

1-p^ 

As j~*eo and p^<l G approaches the value (l-p^) ' This could be 
called the steady state value of the variance of e /o Evaluating the 
difference betweeen successive diagonal terms of Gj for any p* yields 

‘'[»-c(l-p^)] (6A20) 

At least five cases can be identified using (6A 20) 


Case 1 


&Cj=0 c»(l p*) ' p*<l ( steady state case) (6A21a) 

Case 2 

AC,=1 p=l (6A21b) 

Case 3 

c=l (6A21c) 

Case 4 

AC^>0 c<(I— p*) ' P^<t (6A2Id) 



APPENDIX 6A AUTOREGRESSIVE MEASUREMENT ERRORS 


325 


Case 5. 

AG,<0; 0 ( 1 p2<i. (6A.21e) 

In each of these cases except case 2 and if p^< 1, the value of is the 
same, namely, (1 — In Case 1, Gj is this value of (1 — p^)""' for all j. 
In Case 4, Gj increases monotonically to this value whereas in Case 5, G, is 
the largest value and Gj decreases asymptotically to G^. Hence it is possible 
to have constant, increasing, or decreasing V{Ej), depending on whether 
c = (l-p^)”', c<(l — p^)“', or c>(l— p^)“', respectively. If p=l as in 
case 2, Gj=j, which clearly has no steady-state value; Case 2 is called the 
cumulative error case. If p^> 1, the Gj values grow very rapidly with J. 
Some matrices for the above cases are 



(6A.22a) 



(6A.22b) 



1+p^ p(l-)-p^) 

1 -f p^-1- p^ 


p'’-^(H-p2-t-p^) (6A.22 c) 


symmetric 



326 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


APPENDIX 6B MATRIX INVERSION LEMMA 

Important relations for inverses occurnng in parameter estimati 
derived below Let A be pXm and B be mXp Let be the 
identity matrix 

An identity and some rearrangements of it are as follows 
-A(I„+BA)»-(/^+AB)A 
- AB - (1, + AB)A{1^ +BA)" 'B 
\ * (1, + AB) - (1^ + AB)A(I„ + BA)“ ‘B 
premultiplying (6B 3) by (1^ + AB) ' yields 

(I^ + AB) *-l^-A(l„ + BA) 'B 

Let P be defined by 

Ps[X''t->X+V/]‘' = [v,X''<.-'X + l] 'Vj 
Using (6B 4) for (6B 5) withA-V^X^^- ’andB«X gives 
P-[l-V,X\ '(I + XV„X’^ ■) 'x]v. 


p-=v,-v,xf(xv,xf+*) 'xvj 


This equation is called the matrix im.ersion lemma Equation 6 
found from (6B 6) by letting 

P-*P*^1 x-»x*, 


(6B 1) can be written as 


mXm 

(6B 1) 
{6B2) 
(6B3) 

(6B4) 

(60 5 ) 

(6B 6) 
7 5a IS 


(I,+AB) 'a=A(I« + BA) 


(6B7) 




PROBLEMS 


327 


Using the substitutions suggested above for A and B results in 


- ’ = WpX^{X\pSJ' + tP) 


(6B.8) 


PROBLEMS 


6.1 Evaluate the following matrix products 

(a) (b) 

r 1 0 


■ 2 

-1 ■ 

■ 1 

0 1 

■ 

. -1 

2 . 

1 

-1 1 , 



(c) (d) 



(e) if) 


C| 0 



^11 

o 

1 

0 c, 

Oji 022 


On 022 

[ 0 C2 


6.2 For X given by 


X = 


1 I 
1 2 
1 3 
i 4 


1 

4 

9 

16 


and D ’ with p = 0.5 in (6.9.4), evaluate 


X^O 'X, where S2 = DD^ 

Choose the proper size for D”' 

63 Prove for A and B being nonsingular square matrices 

6.4 A commonly given method for finding the inverse of a square matrix A given 




328 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 


by 



involves the use of cofaciors A cofactor <4^ is (— ly where D,j is called a 
minor and is obtained by taking the determinant of the submatnx formed by 
striking out the row and column corresponding to the element Oj The inverse 
of A contains the elements Using this expression for the inverse, 

verify (6 I 10) and (61 II) 

65 For Model 5 of Chapter 5. + show that the matrices {X’’X)"' 

and X^'Y used in OLS are 


(X'‘X)’'- 


2X.,Xn 


-'S.X.jXa 

^xi\ 


where 4“(2;K,?K2Ar,i)-(2X„Ar,,)^ 





66 (j) For Model 5 of Chapter 5 give the components X^^“'XandXV ‘Y,of 
b^L g'S'en by (6 5 2) Let ^ 'be given by W where the components are 
and for i?fc/ 

(6) Using the results of ia\ write out the equation for bi ml Check your 
answer by letting ^ and usmg the Table 5 3 results for Model 5 

67 Find (X^X) ' for the cases 

(<7) Y,=‘Pt + 0iX, + *, with AT, = I Yj=2 Yj-3, and Ar4“4 
(6) Y, = l3iX +liiX.^+<, wthX,= -2, -1 0 1, and 2 

(c) y/“/3|Ar,i + ^2Y,j+«. fori = l,2, 9 

Xj| = X4, = X7,’= -k Xj,=Xj,»Xg,=0 X„“X*,=X5,=\ 

A,i=X22=Yj2=-1, X42=Xjj=»X„ = 0, A'72“Ygj = X„=l 

6 8 In Welty (8, p 247] the following are recommended coordinates for natural 



PROBLEMS 


329 


convection from horizontal cylinders to liquids and gases. Also given are the 
logarithms to the base 10 . 


Nud 

GroPr 

logNuo 

logGr^Pr 

0.490 

10 "^ 

-0.3098 

-4 

0.550 

10 "^ 

-0.2596 

-3 

0.661 

10-2 

-0.1798 

-2 

0.841 

10 "' 

-0.0752 

-1 

1.08 

1 

0.0334 

0 

1.51 

10 

0.1790 

1 

2.11 

102 

0.3243 

2 

3.16 

10 ^ 

0.4997 

3 

5.37 

10 “ 

0.7300 

4 

9.33 

10 ® 

0.9699 

5 

16.2 

10 ® 

1.2095 

6 

28.8 

lO’ 

1.4594 

7 

51.3 

10 ® 

1.7101 

8 

93.3 

10 ’ 

1.9699 

9 

Let y = logNu 0 and A' = logGrDPr. 


(a) Using the last 

seven data 

pairs in the above data with orthogonal 


polynomials, find ao,a„...,a 4 . 


Answer. 1.2212, 0.2450, 2.644 X 10■^ 1.6667 X 10“^ 5.4545 X 10"^ 

(h) Find the sum of squares for each of the models that can be obtained from 
the results of (a). 

Answer. 1.6814, 5.95xl0'^ 8.21XlO-^ 8.15x10-®, 6.80x10"®. 

(c) Assume that the assumptions 11111011 are valid. At the 5% level of 
significance give a recommended model in terms of Nuq and GrpPr. How 
does your model compare with 

NuD = .53(GrDPr)-^^ 

which is often used for 10 ‘‘<Gri 3 Pr< lO’? 

6-9 Repeat Problem 6.8 for the firs/ seven data pairs. 

d.lO Using all the data of Problem 6 . 8 , repeat Problem 6 . 8 . Use the orthogonal 
tables in Beyer [ 12 , p. 505] or some other book. 

Using the last seven data pairs of Problem 6.8 with F=Nu 0 and A^=Gr 0 Pr, 
estimate, using OLS, 

(n) /Ij and ^2 in y, = j 8 i + yS 2 AL, + c,. 



330 

CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIVUTION 


(6) and /Js m Yf= 

Pi+PiX,+PiX^ 



(c) Compare the residuals found using Jhc results of (6) with those given by 


the mode! 




Nu 

D=05638(GrpPr>** 

« 

612 

Using the data of Problem 6 8 and starting with the last pair of observa 


tions for }'=IogNuD and Ar=logGroPr use 

OLS sequential estimation for 


the model 



613 

Repeat Problem 6 12 for the model 



)-,-ft+ftAr+AJr=+ 


614 

The following data are a 

continuation of those in Table 6 2 Using the Y 


data below and orthogonal polynomials find a satisfactory model utilizing 


the F test at the 5% level of significance 



Obs No 

Time (sec) 

YCF) 


18 

1632 

142 93 


19 

1728 

139 34 


20 

1824 

136 04 


21 

1920 

132 94 


22 

2016 

130 07 


23 

2112 

127 39 


24 

2208 

124 88 


25 

2304 

122 46 


26 

2400 

120 IS 


27 

2496 

11804 


28 

2592 

115 97 


29 

2688 

114 13 


30 

2784 

112 35 

6.15 

Suppose that t] is given by 




= Up U| r + Ujr* + ujl 



and that 




il(0)=0 

i|(])=l and 



Show that for these conditions the model becomes 



= r(2-/)+/5r(l-f) 



where fi is the single parameter 




PROBLEMS 


331 


6.16 The average rainfall, wind velocity, temperature, etc. at any location over a 
number of years is periodic. Assume that the dependent variable t] is the 
function of t given by 

7 ] = ^i + jS2COS2wt,- 


(a) From reference 24, Table No. 319, the average wind speed (in mph) at 
the airport of Great Falls, Montana, is as follows; 


Jan. 

Feb. 

Mar. 

Apr. 

May 

June 

15.7 

14.8 

13.4 

13.2 

11.4 

11.4 

July 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

10.3 

10.5 

11.7 

13.8 

15.0 

16.0 


Estimate ;3, and J32 using OLS. Calculate the residuals. (Let the January 
value be at ( =0.5/ 11, etc). 

Answer. 13.1, 2.64 

(b) Suggest another model that may be able to fit the data better than the 
one given in (a). 

6.17 The following data are normal monthly average temperatures (in ‘'F) given 
by reference 24, Table No. 310, for St. Paul, Minnesota, and San Francisco, 
California. 



Jan. 

Feb. 

Mar. 

Apr. 

May 

June 

St. Paul 

11.8 

16.2 

28.0 

44.6 

56.5 

66.2 

San Fran. 

50.9 

53.4 

54.3 

55.3 

56.7 

58.7 

St. Paul 

July 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

71.2 

69.4 

59.1 

49.2 

31.9 

18.2 

San Fran. 

58.5 

59.4 

62.2 

61.4 

57.4 

52.0 


(a) Suggest an appropriate function for tj to describe the St. Paul data. The 
normal minimum monthly temperature is 8-10°F less for all months and 
the normal maximum temperature is 8-10°F greater for all months. 

(b) Suggest an appropriate function i) to describe the San Francisco data. 
The normal minimum and maximum monthly temperatures are within 
±7°F for all months. 

6.18 The model 


T, = /Ji + + £, 


IS proposed for the following data. Very little is known regarding e,. Estimate 



332 CHAPTER 6 MATRIX ANALYSIS FOR UNEAR PARAMETER ESTIVUTION 


and P2 the sequential method 

^ 

0 114 

1/12 152 

1/4 198 

5/12 157 

1/2 96 

7/12 54 

3/4 -10 

11/12 51 

1 % 

Assuming that the standard assumptions hold estimate the covanance 
matrix of these estimates 

Ai»v.eT for 1-1 6,-10089 6^-10333 f„-0IU P,-00000 Pj2-03333 

6 19 Use orthogonal polynomials for the Example 6 2 3 data The components of 
are 339068 - 330244 2608 68 -1448 64 96600 and 23472 Use f 
test at S% level of significance to find the regression line of suitable degree 
6.20 Repeat Problem 6 18 using the standard assumptions and the subjective prior 
information chat 

(If -100 

V .[200 100 

' 1 100 3000 

Estimate 0i and 0t using this information and find the covariance matrix of 
the estimates Iteration may be required 
6 21 Sho'v that 

bMAP-bML=-PMI.(Va+PML) W-Pfl) 

Note that i( Vp is sufficiently large compared vath Pmi, then for 

any finite ft/) 

622 Write a FORTRAN or programmable calculator program to calculate the 
orthogonal coefficients using (636) Give coefficients For r—A n=9 and 
1-12 6 

623 Using the eight measurements at time 03 sec of Table 7 14 estimate m the 
model T'=^i Find the 95% confidence interval What assumptions are 
needed’’ 

624 Find the 95% confidence region for the data of Example 6 7 3 for the two 

parameters in the model Assume the errors are additive have 

zero mean and constant variance and are uncorrelated and normal Also let 
T have negligible errors There is no pnor information regarding the parame 


PROBLEMS 


333 


6.25 Using the temperature measurements at times 0.3, 0.6, 0.9, and 1 .2 sec for all 
eight thermocouples of Table 7.14, estimate ^6, in the model 7’=»; = /S|. 
Assume 

rij(') = Pi + Ej(i) 

£y(0 = «y(l) 

ejii) = £j(i-l) + Uj(i) for / = 2,3,4 

Uj{i) = N(0yi) 

Use maximum likelihood estimation. Also find the 95% confidence interval. 

6.26 The temperatures for thermocouples 5-8 of Table 7.14 can be described by 

r=/3, + y3,(/-3.3)'/^ 

from / = 3.3 to 7.5 sec. P 2 is equal to 2q(‘7Tkpc)~'^^ with q being heat flux, k 
thermal conductivity, p density, and c specific heat. Using temperatures from 
thermocouples 5 and 6 from 3.3 to 7.5 sec, estimate /S, and Use the 
sequential estimation method for /«= 1. Calculate and examine the residuals. 
Discuss your assumptions. How can the model be improved? 

6.27 Derive (6.7.9) and (6.7.10). 

6.28 The following are temperature measurements taken from Table 7.14; 

/, time (sec) Tf, T, Tg 

15 94.56 93.91 94.75 94.17 

18 96.52 95.70 96.30 95.96 

(a) Estimate /3 in the model T=95.5 + /i((— 12) using OLS estimation. 

(b) Find the 95% confidence interval for the estimate of p. What assump- 
tions are needed? 

6.29 Let the assumptions on the measurement errors be 

Y = Xp-t-e, £ — N (0,a^U,) with unknown and known 

X is errorless. The prior information regarding the random parameter vector 

is (jn^, o^V) where V is known. Derive the MAP estimators for /3 and 
0^. 

Kav= Pp + 'PX'^a~^(Y ~Xp.p) 

where 


MAP 


P“'=x^n-‘x-t-v-', 


Y = Xb| 



Minimization of sum of 

SQUARES FUNCTIONS FOR MODELS 
NONLINEAR IN PARAMETERS 


71 INTRODUCTION 

This chapter is concemeil with methods for miTiimiiing a general sum of 
squares function when the dependent variable is nonlinear in terms of the 
parameters The function to be cxtremized is assumed to be known 
although selection of such a function is not a trivial matter and comprises 
the first optimization problem m parameter estimation (Extremize means 
either maximize or minimize) The extremization of the chosen function 
the topic of this chapter is the second optimization problem of parameter 
estimation Other optimization problems relate to optimal design of expen 
menls and optimal designs for discnimnation between competing models 
In engmeenng and science raosi phenomena are modeled using differen 
tial equations The solution of these equations may be available m closed 
form or may be obtained through the use of finite difference or finite 
element computer solutions Regardless of the method of solution of a 
differential equation the model k more often than not a nonlinear function 
of the parameters The differential equations and boundary and initial 
conditions may be linear m the usual mathematical sense and still have a 
solution that IS nonlinear in terms of the parameters See for examples the 
models in Section 7 5 


334 



7.1 INTRODUCTION 


335 


A problem can be either linear or nonlinear. The nonlinearity in one 
case can pose more difficulties in obtaining a solution than in another 
nonlinear case, however. 

The extremum found for the sum of squares function when the model is 
linear in the parameters can be proved to be the correct one for OLS 
estimation as well as for ML estimation provided the conditions 11-1011 
(see Section 6. 1.5.2) are satisfied and a unique minimum point exists. 
Complete assurance* cannot readily be given for nonlinear cases since 
there may be more than one extremum. If one has reason to doubt that the 
extremum found is the desired one, it is recommended that contours of 
constant S be plotted in the region in which the solution is expected. This 
involves an extensive search. Another possibility is to start the iteration 
procedure with different sets of initial parameter values. 

It should be noted that the same problems associated with ill-condition- 
ing in linear problems also arise in nonlinear estimation problems. In 
linear problems the theoretical existence of a minimum may be a mirage 
for ill-conditional cases, that is, those associated with relatively small 
values; see Section 6.7.6. In such cases slight changes in the 
measurements can cause large movement in the location of the minimum, 
resulting in large perturbations in the estimated parameter values. Because 
of this sensitivity for ill-conditioned cases, convergence proofs for nonlin- 
ear cases may also be more academic than practical. 

Several simple optimum seeking methods are given in this section. In 
Section 7.4 the Gauss method is given, and in Section 7.6 several modifica- 
tions of that method are described. Also in Section 7.6 a comparison of 
several methods is given. Later sections discuss sequential methods and 
correlated errors. 

7,1.1 Trial and Error Search 

One of the simplest procedures for extremizing a function is trial and error. 
It is quite inefficient and is not recommended. It is easily described and 
Understood, however. 

As for most nonlinear search procedures, the trial and error procedure 
starts with a set of estimated values of all the parameters. Let the initial 
parameter estimates be designated and for simplicity let there be an 
associated OLS sum of squares 


= (7.1.1) 

For some cases convergence proofs are available but they are usually not practical to 
'Dtp ement for a number of different reasons. See Bard [5, p. 87], 



33« 


CHAPTER 7 MINJMIZATION OF SUM OF SQUARES FUNCDONS 


(A more general function is given in Section 7 3) Next, another set of 
parameters b*'* is chosen more or less arbitranly and is calculated If 

(7 12) 

then the combination of parameters given by b**’ must be “better ' than 
b'°*, one might wish to select a b'*’ to be a modification of b’’’ in the same 
manner that b’” was of b'°* If the inequality m (7 I 2) is reversed, then we 
would not proceed m the same manner There are many possible 
strategies that could be employed to choose different b's One is to fix 
all the b's except one which is varied until S reaches a relative minimum 
at which time this parameter is fixed and another one is varied This 
procedure usually requires considering further changes in each parameter 
after the other parameters have been reestimated 
In the procedure outlined above there is no rule that musl be followed 
in selection of new sets of parameters Instead one can try any set of 
parameters that seems reasonable This can be done interactively using a 
teletype computer terminal In so doing one usually finds that this method 
of solution is inefficient time consuming and not practical One can, 
however obtain a ' feel” for the difficulty of extremizing a function, 
patticularly when there is mote than one paiameter 


7.1 2 Exhaustive Search 

Another simple (and also inefficient) procedure is termed an exhaustive 
search To illustrate this procedure consider the case of only one unknown 
parameter P The "best” value of this parameter is to be associated with 
the tnmimum value of S Instead of selecting only an initial estimate a 
region of 0 is chosen in which region the minimum value of S' is expected 
to be Suppose that the parameter /3 is known to be between 0 5 and 2 0 In 
an exhaustive search S is calculated at equally spaced values of b m this 
region (/? is the true value and b is an estimated value ) Fig 7 la shows S 
for Ab intervals of 0 25 The best 6*'^ value as indicated by Fig 7 la is 
A more accurate value of b could be found by conducting an 
exhaustive search with a smaller Ah in the reduced reeion between and 

The exhaustive search procedure is undoubtedly expensive but it does 
have the potential of revealing local minima in addition to the global 
minimum This is illustrated by Fig 7 \b The exhaustive search procedure 
IS more likely to produce the global minimum than some other schemes 
Irrelevant local minima are encountered in parameter estimation but are 
not common 



7.1 INTRODUCTION 


337 




It should also be noted that the global minimum may not necessarily be 
the desired one. For example, a certain mechanical device may be known 
to have a natural frequency about 1 Hz. This frequency is to be more 
precisely estimated utilizing several measurements of deflection versus time 
of the device. A local minimum of S would be expected near 1 Hz but the 
global minimum might occur at a considerably higher frequency. Another 
example occurs when the functional form of S incorporated in a computer 
program may have a global minimum at negative values of parameters 
which are not physically possible. 

7.13 Other Methods 

There are many other methods for locating extremes of arbitrary functions. 
These include direct search, Fibonacci search, gradient methods, random 



338 


CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS 


search Hooke-Jeeves search, Mtnple^i exploration, and dynamic program- 
ming methods See a text on optimization such as Beveridge and Schechter 
(21 Most of these methods arc much more efficient than the two methods 
just mentioned 

Rather than giving many methods we shall emphasize one basic method 
and then suggest some modifications of it This method is called the Gauss 
method It has proved to be very effective for a large class of different 
parameter estimation problems 

7.2 MATRIX FORM OF TAYLOR SERIES EXPANSION 

In the Gauss lineanzauon method of minimizing a sum of squares function 
a Taylor senes expansion of a vector is needed Let tj be an n vector and a 
function of the p parameters in the vector Let 17 have continuous 
derivatives in the neighborhood of /3»b Then the Taylor series for a point 
P near b begins with the terms 

,0)-.|(b)+[7,,''(b)]''0-b)+ (721) 

where is the matrix dcnvaiivc operator defined by (6 1 21 ) 


73 SUM OF SQUARES FUNCTION 

In ordinary least squares, weighted least squares, maximum likelihood, and 
maximum a posteriori estimation, the sum of squares functions to be 
minimized are generally different In some cases, however, the MAP sum 
of squares function reduces to that for ML which in turn reduces to that 
for OLS estimation For that reason and for economy m presentation a 
sum of squares function is given that is appropnate for OLS, WLS, ML, 
and MAP estimation when appropnately specialized The function that we 
consider in this chapter is 

-[V-il(P)]’w[y-,(P)]+(p-rt'’U(p-rt (73 >) 

Both VV and U are weightmg matnees which are symmetric, W is positive 
definite and U is positive semidefinite In many cases W and U will be 
assumed to be completely known 

The cases of single and multiresponse for discrete measurements are 



73 SUM OF SQUARES FUNCTION 


339 


included in (7.3.1). When a single sensor is used to obtain many observa- 
tions, we have a single response case. In this situation the observation 
vector Y and corresponding vector found from the model, t], are n vectors. 
The square matrix W is nXn. The parameter vector /3 contains p compo- 
nents as does p; U is a square matrix of dimensions pXp. Much of this 
chapter explicitly considers the discrete single response case. 

Extensions to multiresponse cases of the algorithms given in Sections 7.4 
and 7.6 are not difficult. Section 7.8 provides a sequential method for this 
case. The multiresponse case occurs when m (> 1) measurements are taken 
at n different times. The observation vector can be written as 



'y(I)' 


' Y, (/) ■ 

Y = 

Y(2) 

where Y(/) = 

^2(0 


Y(«) 


Y„,(/)_ 


Hence the Y vector contains mn components. The tj vector can be 
similarly defined and W is mn X mn. The j3, p, and U matrices remain as 
given above. 

In some situations it is natural to consider continuous rather than 
discrete measurements in time. This may be either because the measure- 
ments are actually continuous or because it is more convenient to analyze 
them as if they were. Then we would replace (7.3.1) by 

5= r'^[Y(0-7,(r,^)]%(0[Y(0-ii(/,j8)]rf/ + (/*-/3)^U(iu-^) 

•'0 

(7.3.3) 

where the time limits are 0 and tj. If the case being considered involves a 
single response, Y(/) becomes the scalar Y{t). For the multiresponse case 
the Y(t) vector is 


Y(/) = 






(7.3.4) 


and Tj(r,/3) is similarly defined. In many algorithms to be given, the 
summations on a time index can be replaced by integrations over time. 



340 


CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCHONS 


One further modification (or inteipretation, depending on one's 
viewpoint) of (7 3 1) is for situations in which the dependent vanables in 
the system model are not measured directly This is a case discussed 
frequently in the systems literature To illustrate this case assume that the 
system model is the set of nonlmear, first-order ordinary differential 
equations, 

x«f(l,x.p.u) (7 3 5) 

The dimension of the dependent vanable x is r, f is some known function 
of X l,p and u which could be related to some forcing function or control 
It may be that all the components of x are not measured directly Instead 
some linear or nonlinear function of the x may be measured, 

Y“hx + c or Y=8(tx) + e (736) 

Here the dimension m of the Y. component of the Y vector would be equal 
to or less than r In such cases the dependent variable rj in (7 3 1) could be 
replaced by hx or s(r x) 


7 4 GAUSS METHOD OF MINIMIZATION 
7.4.1 Derivation 

One of the simplest and most effective methods of minimizing the function 
S IS variously called the Gauss Gauss-Newton Newton-Gauss, or lineari- 
zation method, we call it the Gauss method li is attractive because it is 
relatively simple and because it specifies direction and size of the correc 
tions to the parameter vector The method is effective in seeking minima 
that are reasonably well-defined provided the initial estimates are in the 
general region of the minimum For difficult cases (i e , those with indis- 
tinct minima) modifications to the Gauss method discussed m Section 7 6 
are recommended 

A necessary condition at the minimum of S is that the matrix denvative 
of S with respect to p be equal to zero For this reason operate upon S 
using (6 I 30) to get 

V„S-2[-V,,’-((!)]w[V-,((!)] + 2[-I]U(p-/!) (74 1) 
Let us use the notation X( P) for the sensinuty matrix, 

( 742 ) 



7.4 GAUSS METHOD OF MINIMIZATION 


341 


SO that (7.4.1) set equal to zero at J8 = i8 becomes 

X^( 4 )W[ Y - 7, ( ^ ) ] + U( ,x - ^ ) = 0 (7.4.3) 

Unfortunately, we cannot easily solve for the estimator j3 since J8 appears 
implicitly in rj and X as well as appearing explicitly. Suppose that we have 
an estimate of j8 denoted b and that ij has continuous first derivatives in /3 
and bounded higher derivatives near b. Two approximations are now used 
in (7.4.3). First, replace X(P) by X(b) and second, use the first two terms 
of a Taylor series for rj(fi) about b. Then (7.4.3) becomes 

X»W[ Y - 7? (b) - X(b)( iS - b) ] + U( /X - b) - U( j8 - b)«0 (7.4.4) 

Note that this equation is linear in j3. If (1) tj is not too far from being 
linear in j3 in a region about the solution to (7.4.3) and if (2) this region 
includes b, the value of /3 satisfying (7.4.4) will be a better approximation 
to the solution (7.4.3) than that provided by b. Assuming these two 
conditions to be true, (7.4.4) is set equal to zero. In the interest of 
compactness of notation and to indicate an iterative procedure let 

5(« = b, b<* + '> = /5, 7,<« = 7}(b), X'« = X(b) (7.4.5) 

Using this notation in (7.4.4) set equal to 0 yields p equations in matrix 
form for 

(7.4.6a) 
(7.4.6b) 


which is the Gauss linearization equation. Iteration on k is required for 
nonlinear models. For linear-in-the-parameters model no iterations are 
required. Note that for 't] = X|3, (7.4.6) reduces to the MAP equation 
(6.6.6a) by setting b^*l equal to zero. No constraints are included in (7.4.6). 

( 01 ^. (7-4.6) in nonlinear cases an initial estimate of designated 

, is needed. With this vector and X^°^ can be calculated, which, in 
turn, are used in (7.4.6a) to obtain the improved estimate vector b^'\ This 
completes the first iteration. Then tj^'^ and X^’^ are evaluated so that b^^^ 
can be found. The iterative procedure continues until there is negligible 
c ange in any component of b; one criterion to indicate this is 

— — <5 for /= 1,2,...,/? 

+ 5, 



b<* ■*•') = b<'^> + P'*)[X^(*^W(Y - 7] <'^)) + U( ja - b('^>)] 
P ~ + U 


(7.4.7) 




342 CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS 

where 5 is a small number such as 10“^ In order to avoid embarrassment 
if goes to zero, the quantity 6, is set equal to another small number 
such as 10" When good initial estimates of the parameters are available 
and the experiment is well designed, (7 4 7) is frequently satisfied by the 
seventh iteration See Table 73 for an example of this (The fact that 
(7 4 7) is satisfied does not guarantee that the last minimizes S, 
particularly when the minimum is iH-defined ) 

As a minimum is being sought, the function 5 should logically decrease 
from Iteration to iteration One might then include a check m a computer 
program to see if is less than If it is not, the procedure could 
either terminate or the correction of the parameters, could be 

decreased as discussed in Section 7 6 In some cases, however, a temporary 
increase in 5 could permit larger parameter changes to the region of the 
minimum and actually lead to more rapid convergence 


7,4 2 Components of Gauss Linearization Equation 

Consider the sensitivity matru as defined by (7 4 2) without showing the 
dependence on b^*’ it can be written for a single response case as 







■'^11 



a^i 


x*= 


= 



til 1.1 


K, 



d 



np j 






Hence the ij element of X‘*’ is 



371, 

37), 



S’?™ 

aif. 




(748) 


(7 4 9) 


This definition of X is consistent with the linear model A simple example 
IS ■F), = ^iX„ + where has the same meaning as in (7 4 9) A model 
which is nonlinear in a parameter is 


i),=^,exp(j 82 »,)+/Jj 


(7 4 10) 



7.4 GAUSS METHOD OF MINIMIZATION 


343 


Its sensitivity coefficients are 

dt] dri, 

^,1 = -^ = exp( M)’ ^-2= ®^P( ^20’ ^,3= J (7.4. 11) 

The matrix X^WX is a symmetric matrix of dimensions pXp. Let 

C=X^WX (7.4.12) 

where the element of C is 

" " 9 t 7 / dr]^ 

I 2 = (7.4.13) 


If the weighting matrix is diagonal, (7.4.13) simplifies to 


" 9t); dri, 

C,= Sv'VM = 2w„-^ — 


(7.4.14) 


Let the matrix product X^W(Y — tj) m (7.4.6a) be designated H, 

H=X^W(Y-t)) (7.4.15) 

which is ;j X 1 vector and has a typical component of 

77,= i t^,rX„{Y-p;) (7.4.16) 

/=1 r=l 

For and the quantities iV,, and are evaluated with /3=b^''l 
For the simple case of one parameter (p= I) the iterative equation is 






CW + [/, 

An initial estimate of /S, designated must be provided. 


(7.4.17) 


Example 7.4.1 

Estimate and in the model T] = j8,+(l + y320^ using fi, = h)‘» = 2, jx 2 =hi°>=l, 


' 1 

1 

1 ■ 

, V '-‘ = 

2 

-1 

O' 

1 

2 

2 

-1 

2 

-1 

1 

2 

3 


. 0 

-1 

1 



0 

4 



344 


CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS 


(This ^ function is associated wiih cumulative errors in y, ) The data are 

t, -10 1 

y, 1 3 85 

Use the sum of squares function given by (7 3 I) m which is to be used for 
W 


This problem is solved using (746) in an iterative manner Consider the first 
Iteration The sensitivity coefficients and values are 




3/J, 


.2(l+;8ll,)/, 


vrwi^l 1 I 

[O 0 

Using this vector we find 

and thus P ‘ is 

which has the inverse 


fo 2oJ 


Another expression needed in (7 4 6) is 
Xno)^ ‘(Y-T,“”) + U(/i-b«")= 


e (7 4 6a) to compUle tl 


The initial sum of squares as defmcd by (73 1) has a value of 25 After 



345 


7.4 GAUSS METHOD OF MINIMIZATION 


the first iteration we calculate S to be 

^(1) = [Y - 1 , ' [Y - + ( ft - b''>)^U( /I - b<'>) 


1-1.25 

3-2 

T 

2 

-1 

-1 

2 

O' 

-1 

■ -0.25 ■ 
1 

8.5-7.25 


0 

-1 

1 

1.25 


'2-1 

T 

0 o' 

1 

1-1.5 


0 4_ 

-0.5 


= 2.6875 

which is lower than (If it were not, then a method given in Section 7.6 would 
’’^Forlhl second iteration the calculations proceed in a similar manner. We have 


1 

1 

1 ' 
0 

p-i(i) = 

■1 n 

1 31 1’ 

p(»= _L 

30 

'31 -r 
-1 1 . 

1 

5 


L J 




X(» = 


Xm^-i(Y-T,(‘’) + U(fr-b‘'>) = 


-•25 L 

JO 

O' 

' 1 


' -.25' 

"j 

-.25 J 

[o 

4. 

-.5. 


_ -2.25. 


which results in 


ri 1 , 1 

31 -nr -0.251 


0.81667' 

1.5 J 30 

-1 lJ[-2.25. 


[ 1.4333 


bG) = 


The associated 5 value is S <2) = 2.497059 which is, as it should be, less than S^'\ 
The above results along with those of the third and 

summarized in Table 7.1. The expression means Ai, =fi, ~ • otice 

that the results converge quite rapidly since the relative changes in both paraineters 
by the fourth iteration are less in absolute value than 10 (thus satisfying ( • • ' * 
5 is 10“''). There are some changes in sign in the corrections, ^b„ but no instability 
is noted. 


Table 7.1 Summary of Calculations for Example 7.4.1 




0 2 1 

1 1 1.50000 -0.5000 0.500 

2 0.816667 1.433333 - 0.1833 - 0.0444 

3 0.810561 1.435251 -0.748X10“^ 0.134x10“^ 

4 0.810630 1.435166 8.56x10”* -5.85X10”* 


^(* + 1) 

8.250000 

2.687500 

2.497059 

2.496940 

2.496939 



346 CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCHONS 

7.43 Comments on Gauss Lineatixation Equation 

Several comments and observations regarding the Gauss linearization 
equation are given in this section 

(a) By letting W=I and 11=0, (746) provides ordinary least squares 
estimates 

(b) If the observation errors t satisfy the standard conditions of being 
additive, zero mean, ^ is known within a multiplicative constant and the 
independent vanable(s) and p arc nonstochastic (i e , 1 1 — 01 1), then non- 
linear Gauss-Markov estimation can be used by setting \V=R“' and 
11 = 0 (We are using d'= where is completely known ) 

(c) If in addition to the conditions given in (6), the errors are normal, 
the assumptions are designated 1 1—101 1 Then (7 4 6) provides a nonlinear 
ML estimator if W = n"‘ and t)=0 

(d) If the conditions m (c) are valid except there is prior information 

and IS known, then an MAP estimator is provided by (7 4 6) Suppose 
that p IS a random parameter vector with a mean covariance V, and 

normal probability density The assumptions are then designated 1 1-1 1 12 
By letting W-i}-"', and 1}“V ‘, the conespondmg MAP estima- 

tor ts given by (7 4 6) 

(e) In order to utilize (7 4 6) to estimate the parameters it is necessary 
that P" ‘ have an inverse, that is, P ’ ' be nonsingular Then its determinant 
mu&l not be zero or 

|p 'j-.|X'‘\VX + U|5*0 (74 18) 

in the region of the minimum We call this the identijiabilily condition if 
this determinant is identically equal to zero, there is, in general, no unique 
point at which the minimum occurs A method does not give the complete 
location of the minimum when it specifies a single parameter point while 
there is more than one point at which the tnimmum occurs Since the 
Gauss method will not yield any point in this case, one is alerted to the 
nonexistence of such a point 

For least squares estimation W=I and U=6 For this case it is necessary 
that 

A=|X*‘X|^0 (74 19) 


in the neighborhood of the minimum of S This determinant is shown to be 
equal to zero m Appendix A if any column of X can be expressed as a 
linear combination of other columns This condition, linear dependence. 



7.4 GAUSS METHOD OF MINIMIZATION 


347 


can be written as 

p 

2 CjXjj = 0 for / = 1 , 2, . . . , « for at least one Cj^O (7.4.20) 

y=i 

If (7.4.20) is true, then A given by (7.4.19) is equal to zero. 

This condition of linear dependence is almost satisfied in many more 
cases than would be expected. In such cases A is almost zero; this is what is 
meant by ill-conditioning. See Section 6.7.6. If (7.4.20) is almost satisfied, 
the sum of squares function S will have a unique minimum point and thus 
a unique set of parameters. The minimum point will not be very pro- 
nounced, however. As an example consider the sum of squares function 

S' = (2 - /S, - e -^2) V (3 - 2^1 - e -2^^)^ 
which is plotted in Fig. 7.2 for S = Q, 0.1, and 3. The sensitivity matrix for 





348 


CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCHONS 


the two points indicated in S is 


which exhibits linear dependence for either ^ 2 “® o*’ Since the 

minimum S value is at ^, = 1 and /Jj=0 the condition of linear depen 
dence is almost satisfied in the region of the minimum point Along the 
long axis of the S = 0 I contour the change in 5 is much more gradual 
than in other directions Contours that are long narrow, and curving such 
as S=0 1 are frequently associated with near linear dependence or equiv 
alently with A being relatively small Moreover such contours are typi 
cally associated with slow convergence of the Gauss melhod For this 
reason it is important to examine the sensitivity coefficients over the region 
of interest 

For ML estimation it is necessary that 

|XV 'X|?^0 (7421) 

but ML will not lead to a choice of ' which would cause this 

determinant to be equal to zero if X'X 1 $ not equal to zero Also if X^X is 
equal to zero there is no ^ ' which will make (7 4 21) be true (See 
Appendix A ) Hence the condition given by (7 4 19) is again the important 
one 

For MAP estimation with ' and U“V^' it is possible that 

(7 4 18) may be true even if |X*^X|«0 Thus if there is pnor information 
the sensitivity coefficients may not have to be independent to permit 
estimation using (7 4 6) 

{/) When convergence to the estimates is attained the matrix denva 
live of S goes to zero as indicated by (7 4 4) Since the same terms appear 
in (7 4 6) the p X 1 vector given next must also go to zero at convergence 


X”*’VV(Y ■!“’) + U(p-b'*<)-0 (74 22) 

This means that every component of this vector must be zero For cases 
when U = 0 this results in each // given by (74 16) being equal to zero 
Knowledge of this fad can sometimes aid in checking computer codes lhal 
are not yielding converging solutions 
(g) Though (7 4 6) has been obtained by using the linear approxima 
tion given by (7 2 1) the Gauss equation is not a rigorous first order 
approximation because a first order senes was not used for X'**‘* 



7.4 GAUSS METHOD OF MINIMIZATION 


349 


7.4.4 Linear Dependence of Sensitivity Coefficients 

As stated in the preceding subsection under point (e), the function S for 
LS or ML estimation has no unique minimum point if the sensitivity 
coefficients are linearly dependent. It has been found from experience that 
difficulty encountered in convergence is frequently due to approximate 
linear dependence. In most of these cases the sensitivity coefficients were 
not plotted and examined beforehand. Indeed, in many cases the user of a 
nonlinear least squares program may not realize their importance and thus 
not even examine them after tack of rapid convergence is apparent. For 
effective nonlinear estimation, the careful examination of these sensitivity 
ceofficients is imperative. In order to demonstrate what should be in- 
spected, the following discussion is given. 

For single response cases with approximately constant standard devia- 
tions of the measurements, it is convenient to examine 


^,j — Pj^ij ~ 0 ^ 


(7.4.23) 


Note that has the units of -rj. Then the magnitude of each sensitivity can 
be compared with the others as well as with tj itself. 

For multiresponse cases it is often more meaningful to plot 


( 0 = 




94 * (0 

9 ^. 


(7.4.24) 


which is dimensionless. (Note i refers to “time,”y to the parameter, and k 
to the response.) 

In Figs. 7.3 and 7.4 some sensitivities are plotted versus the variable t,-. 
Those in Fig. 7.3 are linearly dependent but those in Fig. 7.4 are not. The 
first nine graphs in Fig. 7.3 are for two parameters; it is not difficult to see 
the linear dependence in the sensitivities in each case. Note that the 
location of the zero value on the X axes is not arbitrary in most cases. The 
last three cases are for the three parameters being estimated simulta- 
neously; the linear dependence is less obvious than for two-parameter 
cases. 

The importance of the zero value of X is also shown by Figs. 7.4a, b, c, d, 
and /which are not linearly dependent cases as drawn, but each becomes 

cpendent if the zero is moved. What zero location would do this in each 
case? 



Figure 7J Examples of some I nearly dqwodeni sensmviiy coeff cients 


IS EXAMPLES TO ILLUSTRATE GAUSS MINIMIZATION METHOD 
INVOLVING ORDINARY DIFFERENTIAL EQUATIONS 


73 1 Estimation of a Parameter for a Long Fin 

Consider a long fin which has a temperature at its base z^O of 200‘’C 
and which is exposed to a fluid at 100®C see Fig 7 5 The differential 





Figure 7.4 Examples of some linearly independent sensitivity coefficients. 



( 7 . 5 . 2 ) 



352 


CHAPTER 7 MIVIMIZATION OF SUM OF SQUARES FUNCTIONS 



Figure TS Geometry for fm 


a fluid 


h IS the heat transfer coefficient, k is the thermal conductivity of the fm, A 
IS the fm cross-sectional area nonnal to i, and P is the penmeter of A 
The boundary conditions are 

7-{0)=ro. T(<y3)=T„ (7 5 3) 


Equations 7 5 1 and 7 5 3 give a complete mathematical statement of the 
classical boundary value problem The solution for T assuming constant M 

IS 


Ta +(ro- r„ )e-^‘ (7 5 4) 


which contains the three parameters T^. and A/ In a sense only A/ is a 
parameter since and can be considered “slates”, that is, they are the 
temperatures at r*»0 and infinity, respectively From examining (7 5 4), we 
see that T is linear in Tq and T„ but nonlinear in terms of M 
Each of the three parameters T^, and M) enters the problem m 
such a manner that all three could be found simultaneously, rather than 
only m certain combinations On the other hand, h, P, k, and A can not be 
found independently but only in combination A/ If other boundary 
conditions were known, a different set of parameters might be found For 
example, if the heat flux q at a = 0 has a known value, the boundary 
condition is 


-kdT(0) 
- dz 


(7 5 5) 


and the parameters A/, k, and can be simultaneously estimated 
Simulated data for this problem are given in Tabic 7 2 The assumptions 
are 


r,= 


T,+e, 


Vi:,) = 0, V{Y,)=Q. K(}'6) = 0 


q~fV(0.a2). i = 2.3,4,5 


£'(f,Sy) = 0 , 


i¥=j and ii/»=2,3,4,5 



7.5 GAUSS METHOD INVOLVING DIFFERENTIAL EQUATIONS 


353 


The parameter M is nonrandom and there is no prior information. Using 
our notation, these assumptions are designated 11111-11 ; the value of 
need not be known to estimate parameters. 


Table 7.2 Simulated Data for Fin Example 


/ 

Position z 
(m) 

Temperature Y 
(°C) 

] 

0 

200 

2 

0.125 

166 

3 

0.250 

144 

4 

0.375 

128 

5 

0.5 

120 

6 

00 

100 


The assumptions K(y,)= F(y6) = 0 mean that Tq and are the known 
values of 200 and 100°C, respectively. The parameter is M. For the 
standard assumptions given above, OLS and ML estimation provide the 
same parameter estimates. The weighting matrix W in (7.3.1) is W = i|'~* = 
a~H since 


cov(e) = £ (£e^) = diag[a^ a^] (7.5,6) 

The iterative relation for finding M can be found using (7.4.17) which 
can be written as 




5 

2 

1 = 2 


2 tP) 


i {xn"' 

y=2 


- 1 


(7.5.7) 


where the sensitivity coefficient is 


= -{To-T^ )Zi exp( - M 


(7.5.8) 


From a knowledge of heat transfer for this particular problem, an initial 
estimate of M could be given. Many methods are available, however, that 
use only the given data rather than relying on experience. One of these is 
used in this example. Since only one parameter is unknown, let us pick a 
Singe F-. Let us choose the value of F 3 = 144°C which is nearest the 
average of and T^. For T= 144 and z = 0.25, (7.5.4) yields 

144 = 100 + (200 - I00)exp [ - M ®>(0.25) ] 



354 


CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS 


which can be approximately solved for 28m~' Using this value 

(7 5 4) gives the residual vector of 


g(o) S3 Y _ T<o) _ 


166-16636502501 
144-144 04316545 
128-12922925777 
120-119 39800423 


-0 3650250! 
-004316545 
- 1 22925777 
0 60199577 


The sum of squares associated with these residuals is S'^'^«2 008580086 
using a^= 1 The sensitivity matrix and are 


v«»= 


-8 29562813 
-1101079136 
-1096097166 
-9 69900211 


xnoix<o>^ SAT = 404 2685 14 


Using these values and 1 1 13849884 m (7 5 7) yields 


M"’ = 3 IS + - S 28 + 0275522 - 3 3075522m - ' 

404 268514 

The associated sum of squares is S“*- 1 700958954 Note that 


M!^.0«|^.00084 

which IS small compared with unity and thus the initial estimate was very 
good This can also be verified from an examination of the small residuals 
in One might desire to minimize S more precisely Then using the 
above A/‘‘’ value we can find 


-0 13685534 


- 8 26710692 

025916365 

X<'‘=j 

-1093520909 

-0 92881366 

- 10 84830512 

0 86739235 

1 

- 9 56630382, 


A/ <^’ = 3 3075522+ 


007570426 
397 123747 


==^3 ■W7SSllA-OWH9Q(6JVI.=-3 




= 5 763X10-*. 


S">=l 70094504 



7.5 GAUSS METHOD INVOLVING DIFFERENTIAL EQUATIONS 


355 


A third iteration yields 


' -0.13527965 


- 8.26690996' 

0.26124785 

X® = 

- 10.93468804 

-0.92674605 


- 10.84752977 

0.86921561 


- 9.56539220. 


M® = 3.3077428 + = 3.3077433m 


A A 

^(3)_^(2) ^^^^Xio-7 .s®= 1.70094500 

M® 

Several observations can be made from the above results. First, M seems 
to be converging rapidly. The relative corrections are decreasing by a 
factor smaller than 0.01. The sum of the components in e® is not zero, 
unlike a model containing a parameter which has a constant sensitivity 
vector. Note that the residual values changed much more between itera- 
tions than the sensitivity vector (X) values. Another observation is that S is 
decreasing as the iterations proceed. 

In the example given above the initial M ® value is relatively close to the 
converged value, resulting in only three iterations being required. Not 
many iterations are required, however, for a range of initial values as large 
as 0 to 10 (or even —3 to 10, as indicated by the 10 case) as shown 

m Table 7.3. Eight or fewer iterations were required to converge to within 


Table 7.3 


Iteration 

number 

0 

1 

2 

3 

4 

5 

6 

7 

8 


Parameter Values as a Function of Iteration 
for Various Initial Estimates for Fin Example 




0.00 

1.8186667 

2.9666083 

3.2883466 

3.3076357 

3.3077430 

3.3077433 


6.0 

2.1484134 

3.0970338 

3.3001212 

3.3077155 

3.3077432 

3.3077433 


M('» = 8 

8.0 

-0.0457249 

1.782549 

2.9506108 

3.2865352 

3.3076195 

3.3077430 

3.3077433 


A /(°>=10 

10.0 

-3.1824944 

-1.1077316 

0.8839079 

2.4554228 

3.1918187 

3.3053045 

3.3077364 

3.3077433 



356 


CHAPTER 7 MrNlMIZATTON OF SUM OF SQUARES FU^C^ONS 


seven significant figures The sum of squares for this example has the same 
shape as given in Fig 1 7 

In order to design an experiment to obtain the greatest accuracy (mim- 
mum vanance of the parameters if there is no bias) the sensitivity 
coefficients should be plotted and cxammed before the experiment is 
performed As a result, one can more intelligently design the expenment m 
terms of placement of sensors and duration of the expenment It is 
suggested in Chapter 8 that a reasonable optimal experiment cntenon is to 
maximize i = |A'^A') for independent errors and subject to constraints of a 
maximum duration of the expenment and maximum range of the depen- 
dent variable 

Figure 7 6 depicts the dimensionless temperature and dimensionless 
sensitivity coefficient for the example of this section Note that the sensitiv- 
ity coefficient starts at zero at z=*0, increases m magnitude until A/z = l, 
and gradually decreases m magnitude The A criterion for the single 
parameter M is maximized by selecting the maximum magnitude values of 
the sensitivity coefficient If a single measurement ts to be utilized in 
estimation, it should be chosen corresponding to about Mi^\ which 
corresponds to the dimensionless temperature ratio of 0368 Owing to the 
flatness of the sensitivity curve shown in Fig 7 6, little decrease in 
accuracy would result if the dimensionless ratio were chosen to be as large 
as 0 S or as small as 0 25 A T, value corresponding to the dimensionless T 
ratio of 0 5 was chosen m the above example for obtaining the initial 
estimate of M«328 For the more common case of many obstrvatiotis, 
see Chapter 8 



Hz 

Figure 76 Dimensionless temperature and sensitiviiy coefficient for example of Section 
75 1 


7.5 GAUSS METHOD INVOLVING DIFFERENTIAL EQUATIONS 


357 


7.5.2 Example of Estimation of Parameters in Cooling Billet Problem 

A similar problem to the preceding one in terms of the differential 
equation is that of cooling a billet (or' any object) that has a negligible 
temperature variation through it. The temperature of the billet changes 
after it is placed in a fluid at a different temperature. An analysis of a 
cooling billet was also given in Section 6.2. 

Let T be the temperature of the billet at any time t and let be the 
fluid temperature. A differential equation describing the temperature in 
this case is 


pcV^=hA{T^-T) 


(7.5.9) 


where p is density of the billet, c is specific heat, V is volume, h is heat 
transfer coefficient, and A is billet heated area. Various terms could be 
parameters, but the most common one would be the heat transfer 
coefficient. For convenience, however, the factor hA/pcV is considered as 
the parameter; several parameters may be formed from it also. 

Three cases are investigated below. For each one the initial temperature 
is Tq or T(^)=Tq. Also, is considered to be a constant. In the first 
case, let hA/pcV be the constant parameter /?. The solution for T is then 


T{t)-T^ 

To-T^ 




(7.5.10) 


Another case is for hA/pcV being a function of time. A possible function 
is 


hA/pcV=p^ + + (7.5.11) 


and the solution of the differential equation is 
T{t)-T, 


To~T^ 


= exp - ( + P^t^/2 + P,P/3) (7.5. 12) 


A third possible model for hA / pc V is 


hA 


— = ^, + ^ 2 (r-r„)'' foruAO 


(7.5.13) 



CHi^PTER 7 MINtMIZATIOV OF SUM OF SQUARES FUNCTIONS 


where « could also be a parameter The solution in this case is 




i+^(r.-r.)'(i 



f<Kn¥=0 (^514) 

In each of the models given above, (7 510 12,14), T is nonlinear m 
terms of the parameters Only for the last model of h. (7 5 13), was the 
differential equation nonlinear 

If the factor hA/pcV {or more specifically h) vanes during an expen- 
ment, (7 5 12) and (7 5 14) provide a number of competing models For 
example, and/or might be set equal to zero in (7 5 12) In (7 5 14), pj 
might be zero or n might be unity, etc 

The sensitivity coefficients for the second model, (7 5 12), are found to 
be 


^.(')=^--(?'o-?'.)(7)«p[-(#,'+A<V2+A'V3)] 

(7 5 IS) 

where i-* I for )3i, etc For ihc third model (7 5 14), Ihe sensitivities are 


I nCpj 

(75 16) 

^.«l 

(7 517) 


Xzit) = 


(rp-r, 




ar_ {To-T^r*'e-f>>'D 


(7 5 18a) 


and where C is the expression m the brackets of (7 5 14) 


(75 18b) 


1J5 GAUSS METHOD INVOLVING DIFFERENTIAL EQUATIONS 


359 


For estimating h the above models are superior to the power series 
model for T given in Section 6.2 because the above models utilize basic 
physical laws while the power series for T does not. Whenever possible, 
models based on the physical mechanisms (called mechanistic models by 
G. E. P. Box) should be employed. 

A further choice based on physical arguments can be made between the 
power series in t given by (7.5.11) and the temperature-dependent model 
for hA/pcV given by (7.5.13). In many situations the heat transfer 
coefficient does change with time because some related quantity is chang- 
ing with time rather than the passage of time per se. In the present model, 
h might change because a billet’s temperature changes with time. Physi- 
cally, the heat transfer coefficient h might account for heat transfer by 
both natural convection and radiation which could both cause /; to be a 
function of T— T^. This suggests that the model (7.5.13) would be superior 
to (7.5.11). 

To illustrate parameter estimation involving the above models, consider 
again the measurements given in Table 6.2. Results of calculations are 
summarized in Table 7.4. Ordinary least squares was used with the rj 
values being the T values of (7.5.12) or (7.5.14) and with 7’o=279.59'’F, the 
T at t = 0, and 7’„ = 81.5'’F. Models 1, 2, and 3 are for hA/pcV given by 
(7.5.11) with Model 1 being Model 2 being )5, + jSjL and so on. Model 4 
is for fiAfpcV given by (7.5.13) with /i= 1. 


Table 7.4 Estimation of Parameter in Models for Cooling Billet Data" 


Model 

No. 

No. of 

Parameters 

b, 

Parameters 

bi 

^3 

R 

s = 

[R/{n-p)]'/^ 

1 

1 

2.70882 



38.73731 

1.6070 

2 

2 

2.90679 

- 1.39433 


0.7896890 

0.23750 

3 

3 

2.90824 

-1.41968 

0.071344 

0.7892203 

0.24639 

4 

2 

2.13656 

0.0041327 


1.162777 

0.28519 


Units consistent with time in hours. 


The h values in units of Btu/hr-ft^-°F can be found using the ap- 
propriate model [(7.5.11) or (7.5.13)] and multiplying by the pcV/A value 
of 0.83432. (This means that b\ of Fig. 7.7 is 0.83432 times h, of Table 7.4). 
Resulting curves for Models 1, 2, and 3 are shown in 7.7. Also depicted are 
the Fig. 6.2 results for the temperature power series analysis. 

Notice that Models 2 and 3 results are almost identical so that Model 3 
IS not needed. The results of Model 2 and the power series model are very 


360 CHAPTER 7 MJNtMKATIOM OF SUM Of SQUARES FUNCTIONS 



Similar Th« constant h given by Model I does not apppear to be adequate 
from inspection of Fig 7 7 because the other results are quite different and 
appear to be consistent with each other Another argument that suggests 
that Model 1 is not adequate is the large reduction m (he sum of squares 
functions (387 to 0789) This sum P also suggests that Model 3 is not 
needed because the decrease is very slight between Models 2 and 3 further 
note that the estimated standard error* of the temperatures actually 
increases for Model 3 For further related discussion, see Example 774 
Model 4 results given in Table 7 4 show a slightly increased value of R 
compared with Mode! 2 which also has two parameters Even though R for 
Model 4 IS larger one might prefer Model 4 because u represents a more 
reasonable physical model as mentioned above 
An attempt to obtain sirouliaijeous estimates of fit, ft and n in (7 5 14) 
with n = I initially was unsuccessful A further calculation was performed 
for Model 4 to estimate just n with the converged values of ft and ft given 
m Table 7 4 The value obtained was 1 000006 Since this value is nearly 
umty, the value previously used, linear dependence in the sensitivity 
coefficients is suggested To investigate this previously unsuspected depen- 
dence, the sensitivity coefficirots were plotted as shown in Fig 7 8 Notice 

*ln TlWt 7 4 ihe vilue of 7>»16 mm used laiher ihae 17 ihe number ol the observalions. 
because the first value was used to deiennine Tq 



7.5 GAUSS METHOD INVOLVING DIFFERENTIAL EQUATIONS 


361 



Figure 7.8 Sensitivity coefficients for Model 4 of Table 7.4 for n = 1. 


that the sensitivity coefficients for n and are very nearly proportional, 
which tends to make X^X singular and thus the parameters very difficult 
to estimate simultaneously. 

Let us now compare from another point of view these results based on 
the solution of the describing differential equation with these from the 
power series method of Section 6.2. The results of this section required the 
evaluation of two nonlinear parameters whereas the Section 6.2 method 
requires the estimation of five linear parameters. This is an illustration of 
the principle of parsimony which states that we employ the smallest possible 
number of parameters for adequate representation [22]. Also, note that the 
estimated standard deviation of the measured temperatures was 0.243 °F 
for the T series, 0.238°F for Model 2, and 0.288°F for Model 4; the small 
differences between these values indicate that the mechanistic models are 
good in this case. 

After obtaining parameters using OLS or ML, say, it is advisable to 
examine the residuals to see what assumptions seem to be valid regarding 
t e measurement errors. The residuals for Model 2 are very similar to 
t ose given in the upper part of Fig. 6.1. Visual inspection of these 
residuals does not lead to a contradiction of the assumptions of additive, 
zero mean, constant variance, independent, and normal errors. Hence 
accurate parameter estimates would be expected with the least squares 
method used for this problem. 


362 CHAPTER 7 MINIMlZATIOV OF SUM OF SQUARES FUNCHONS 

7.6 MODIFICATIONS OF GAUSS METHOD 

The Gauss method has the feature of giving both the direction and the 
magnitude of the change in the estimate of the parameters of each step in 
the Iteration procedure Small changes m the parameters in the direction 
indicated by the Gauss method decrease the sum of squares Occasionally, 
however, the size of the change indicated by the method is so large that the 
successive estimates oscillate and, even worse, the procedure may be 
unstable This can result from near-linear dependence of the sensitivity 
coefficients and/or very poor initial parameter estimates When the sensi- 
tivity coefficients are nearly dependent (which might be termed “over- 
parametenzation”), one should consider alternatives in addition to other 
minimization procedures One obvious procedure is to decrease the num 
ber of parameters being estimated Another is to redesign the experiment 
so that the correlation between parameters is reduced See Chapter 8 for 
optimal experiment design 

A great many algorithms have been proposed to improve the conver- 
gence of the Gauss method Some of these may be termed modifications to 
the Gauss method whereas other methods some would call distiticUy 
different methods, in the latter category are the Levenberg [3] and 
Marquardt (4) methods We choose to treat these methods as modifications 
of the Gauss method, however 

This section considers jusi a few of the possible methods In this as m 
ocher Iterative problems there appears to be no end to the possibilities, the 
ingenuity of various researchers evidenced by the numerous algorithms for 
this problem is impressive For a survey see Bard {1.51 


7.6,1 Box-Kancmasu Interpolation Method 

Since the Gauss method depends on a linear approximation to tj, in some 
nonlinear estimation cases Che corrections can oscillate with increasing 
amplitudes and thus lead to nonconvergcncc In this section we give the 
Box-Kanemasu modification of the Gauss method which may converge 
■when the Gauss method does not The Box-Kanemasu method does not, 
however, include a check that the sum of squares function S decreases 
from Iteration to iteration Bard {5] has made ihe point that all acceptable 
methods should ensure that S does monotonically decrease to a minimum 
This is a reasonable requirement but in some cases it may lead to more 
calculations than without it, this is illustrated by Table 7 6 which is 
discussed later On the other hand, this requirement might improve conver- 
gence in other cases In order to ensure S continually decreases, a modifi- 



7.6 MODIFICATIONS OF GAUSS METHOD 


363 


cation to the Box-Kanemasu method that has been used by Bard [5] and 
others is included. 

Since the linear approximation is valid over some region, a sufficiently 
small correction in the direction given by the Gauss method should 
improve the estimate (i.e., reduce S). Many methods have been proposed 
which use the direction provided by the Gauss method but modify the step 
size. We generalize (7.4.6) to 

+ (7 6 1 ) 

Y - 1) + U( /X - ] (7.6.2) 

where is a scalar interpolation factor. Note that this factor may be 
iteration-dependent. If /;<''■'+’> is set equal to 1, we have the Gauss method. 

In one class of methods, a search on h is performed to precisely 
determine the minimum S along the Gauss direction [5]. 

Interpolation methods attempt to find good, acceptable values of 
without bothering to locate precisely the value associated with the mini- 
mum S value. Of the many methods possible, one of these is the halving 
and doubling method [6-8]. The modification that we describe utilizes an 
equation given by Box and Kanemasu [9]. The modification is more 
general, however, since {a) the sum of squares function given by (7.3.1) is 
used rather the OLS 5 function and {b) a check for decreasing S is 
included. 

In the Box-Kanemasu method, S is approximated at each iteration by 


A'=aQ + o,A-fa2^^ 


(7.6.3) 


where a,, and Qj are constants characteristic of each iteration. The 
value is taken where 5 given by (7.6.3) is minimized. 

A second approximation in this method is that j3 is given by 


/3 = b<« + Mgb<*^ (7.6.4) 

A minimum of three conditions are needed to find the parameters Og, a,, 
and a^. One condition is to use the 5 value at /j = 0, that is, at /3 = b^*^; this 
■S' value is designated A second 5 value, denoted is found at 
b = a. Initially a is set equal to 1. 

The third condition for finding the a,’s uses (7.6.4) to find the derivative 
-S at /) = 0 and in the A^b^*^ direction. This derivative is 


dh 


;.=o 9A 


9A 

9/2 


/i = 0 


h = 0 


(V)"| 


(7.6.5) 


A = 0 



364 CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS 

The matrix derivative is found from (74 I) to be 

-2[X™’w(Y-ti“') + U((.-l)'*')] (766a) 

and the derivative of ^ with respect to A is found from (7 6 4) to be 


= (766b) 

Then using (7 6 6a,b) in (7 6 5) yields 

(dS/(fA)|^_^=-2G<*> (767) 

(7 6 8a) 

” (7 6 8b) 


Note that is a scalar so lhai it is also equal to its transpose From the 
definition of G it can be proved that G >0 
Using the three conditions for S yields 

ao“ «i =■ - 2(7^*'. oj" [ Si"-* - 5 - 5 **+ 2 C“’a ]a (7 6 9) 

The minimum S is located where the denvative of S [given by (7 6 3)1 with 
respect to h is equal to zero it occurs at the h value of ^ajla^ or 

(76 10) 

This h value is used m (7 6 I) to find the (A:+ l)st iterate for the b vector 
The equation given by Box and Kanemasu [9] is obtained from (7 6 10) by 
setting a = 1 An equation similar to (7 6 10) is given by Hartley [8] and is 
attributed to Dr K Ruedenburg 

There are some restrictions on the use of (7 6 10) These relate to the 
possible values of Cj See Pig 79 Three different cases are for <i2=0 
Oj < 0, and > 0 which are discussed individually below 
In each case a condition suggested by Bard {5) is that 

(7 611) 

The parameter a is made sufficiently small for this condition to be 
satisfied If this inequality is not true for a=l, a is made 5 and the 
inequality is checked again Should the mequality require the investigation 





7.6 MODIFICATIONS OF GAUSS METHOD 


365 



Figure 7.9 Sum of squares versus the h parameter for the Box-Kanemasu method using 
(7.6.3) for approximating S. 

of a values less than 0.01, say, the calculations are terminated. It may be 
that the problem has been incorrectly programmed; for example, the 
sensitivity coefficients may be incorrect. It is also possible that the sensitiv- 
ity coefficients are nearly linearly dependent. In the Box-Kanemasu 
method, the inequality given by (7.6.11) is not considered and a is always 
one. 

The first case that we consider is for 02 = 0. If this occurs the S 
expression as given by (7.6.3) is a straight line which has no minimum at 
any finite value of A; see Fig. 7.9. Hence for this case we set h —Aa 
where A is some constant equal to or slightly larger than unity, one 
possible value is 1.1. 

The second case if for a 2<^0 which would cause h given by (7.6.10) to be 
negative. Again we set = 

The third and most interesting case is for 02 > 0 shown in Fig. 7.9. For 
this case h is calculated using (7.6.10) provided < Aa', if inequality is 

not satisfied, again we set = 

All three cases can be included by requiring that the inequality, 

5'o^*^-(2-/l“')aG^*^ (7.6.12) 


be satisfied in order to use (7.6.10). If it is not satisfied, we suggest that 
be set equal to y4a and the calculation proceeds to the next iteration. 
A computer program flow chart incorporating the above constraints is 



366 


CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS 



Figure 7 10 Flow chart of a procedure using the Box-Kanemasu equation 


given by Fig 7 10 In addition to the two inequalities given by (7 6 1 1) and 
(7 6 12) there is a check on Ihe sign of G From the definition of G, it must 
be positive, thus if it is negative something is incorrect It is important in 
such a program to calculate and 5^** correctly, the same weighting W 
must be used as is used to evaluate 4 and the term involving V m 
(7 3 1) must be included if it is also implied m the A b*** calculation 








7.6 MODIFICATIONS OF GAUSS METHOD 


367 


For the Box-Kanemasu method, the section of the flow chart in Fig. 
7.10 that is enclosed in a box with dashed lines would be bypassed and a 
would be always unity. 

Example 7.6.1 

For the model i) = ^ir + exp( — ySjO and the observations of 2 at r = 1 and 3 at / = 2, 
use the Box-Kanemasu method for one step starting at jSj = 1 and )S 2 = 2. Use OLS. 


Solution 

The S function for this case is shown in Fig. 7.2. For OLS the matrix correction 
is 

Y - 1) 


For the first iteration, k = Q and X^ is 

1 2 

vr(0)_ ‘ ^ 

- exp( - ) - 2exp( -26f>) 

Also (X^'O^XW)-' and are 


-0.1353 -0.03663 




01966 0.2086 

2086 5 


X^(‘»(Y-n«»)=[ * ^ 

' ' -0.1353 -0.03663 


0.8647 '^r 2.828 

.[ 0.9817 J [-0.1529. 


which give [Agb^°^]^= [0.4323 —3.195]. Hence the Gauss parameter estimates are 
1.4323 and —1.195 which results in an S value of 123.42. Such a large relative 
change in hj ( — 3.195 compared with 2.0) suggests that “overshooting” and non- 
convergence might occur using the Gauss method. The method does converge for 
this example, however, as shown below. 

For the Box-Kanemasu method we check the inequality (7.6.12) for «= 1, If the 
of observations n equals the number of parameters p, it can be shown that 
G In this example n=p=2. Then using A = 1.1, (7.6.12) gives 

S]'’^ = 1 23.42 > Sf > - (2 - ^ - • ) (7 

= 1.7113-(2-l.l-')1.7113=-0.1556 
so that (7.6.10) can be used to get the small value of 

hO) = G W [ S - Sr + 2 G ‘ = 0.0 1 37 

With this value we use (7.6.1) to get 

bO' = b(®)-f-/;(>)/^ b*®’=[ ^ l-f00137f ] = [ 1-0059 

gu -3.195 J [ 1.9563 

'^hich has an associated S value of 1.6645, a value lower than 



368 


CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCHONS 


For vhe tnodiiied Box-Kantmasu nwibod use (76 11) to check il S,®’= 
123 42 IS less than ^ Since it is not a is made equal to j and S,'^j is 
calculated at 6'“*+ to get 5|^=0 0279, now the inequality given by (7 6 11) 

IS satisfied Hence set « = j The nghl side of (76 12) which is 

Si“>-(2-/t-')aC<‘’=171J3-(2-l I-')(|)(1 7113)=0779 

IS found to be greater than so that (7 6 12) is nal satisfied Consequently is 
calculated using 

*<«>=,<„= JJ. -0 55 

which gives 5^‘’=000884 If we seek the location of the minimum S for this 
Iteration we get /i = 03336 and 0000862 It is partially fortuitous that the 
modified Box-Kanemasu method happened to yield an h value so near this latter 
value As a result, the modified Box Kanemasu method at the end of the first 
Iteration is much nearer the minimum than the other two methods 
A summary of the h and S values for the three methods for several iterations » 
given below The modified Box Kanemasu method converging most rapidly of the 
three methods was a direct result of the excellent choice of h"0SS m the first 
Iteration As is shown m Section 764 3 ihe same method is not most efficient for 
all cases 


Iteration 

Number h 

"o 

1 1 0 

2 10 

3 1 0 

4 1 0 

6 1 0 

8 1 0 

10 10 

15 10 


Oauss 


S 

17113 
1 234x10* 

4 559 

5 520x10 ’ 

6 19x10"* 
5 5x10 •* 

3 IxlO ‘ 

1 37x10 * 
14X10"“ 


Box Kanemasu 


h S 

17U3 
00137 1 664 

00208 1 595 
0036 148 

0077 125 

0638 148x10 * 

0372 2 8x10 * 

0444 66x10"* 

0933 3 9X10 * 


Modified 

Box-Kanemasu 


h S 

17113 

0550 87x10 * 
0924 88X10 * 
0 955 5 2x10 * 
0951 3X10 * 

0944 1 74X10"* 
0942 IX10"'“ 

0 941 6x10"'* 

0941 2x10 ” 


7 6 2 Levenberg Damped Least Squares Method 

Levenberg [3] tried to overcome the instability of “overshooting" m the 
Gauss method by introducing constraints mlo the minimization of S The 
function that Levenberg considered was the OLS sum of square function 
pluj^n addition term By usmg a WLS function we can generalize 



7.6 MODIFICATIONS OF GAUSS METHOD 


369 


Levenberg’s function by using (7.3.1) with ja being j3 being and 

U being replaced by AS2 where is a diagonal matrix. Using these 
definitions in (7.4.6) gives 

+ ^(k) + (7.6. 1 3) 

The effects of the U matrix are to reduce the size and to change the 
direction of the step. Provided there is a unique minimum which is the 
only stationary point and the iteration procedure given by (7.6.13) conver- 
gences, the estimates found would be those sought. The presence of the 
term tends to reduce oscillations or instabilities particularly as the 
diagonal components of XQ, are made relatively large compared to the 
diagonal terms in X^WX. 

Box and Kanemasu [9] in describing the changes in the estimates of the 
parameters in progressing to the minimum of S state that the term 
— introduces a spherical constraint that 

causes a spiral path. 

Levenberg proved that 5 decreases in the initial iterations if X is first 
large and then allowed to decrease (provided S does not have a stationary 
point at b^**). One recommendation of Levenberg was to make 


U = l (7.6.14) 

Incidently if Si = I and A is very large, (7.6.13) can be written as 

b(;c+i)=,i,u)+xx^^'‘>We^« a:=a-' (7.6.15) 

which is called the method of steepest descent. This method gives a direction 
for the step but not a step size. Since the step size is arbitrary, this method 
can be very inefficient particularly as the minimum is approached. Hence 
the method of steepest descent is not recommended. Note, however, that it 
does not require the inverse of a matrix as do the Gauss and Levenberg 
methods. 

Concomitant with S2 = I, Levenberg suggested the two possibilities of ( 1 ) 
letting A be a constant value and (2) varying A as the minimum is 
approached. One possibility is that the ikth value of A is 


(7.6.16) 

We shall call this the unsealed Levenberg procedure. Note that A^*^ given 
by (7.6.16) goes to zero as the minimum S is approached because each 
component of X^We goes to zero at the minimum of S. (See point (/) in 


370 


CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCHONS 


Section 74 3) As the minimum is approached 5 also decreases but the 
numerator decreases more rapidly 

Box and Kanemasu [9] have made the point that the use of $2 = 1 m 
(7 6 15) results in the method nol being invariant under linear transforma- 
tions of the parameters unlike the Gauss method which is invariant 
Another recommendation of Levenberg was to set equal to the 
diagonal terms of X^WX or 

n.-*as[C„C„ Cj (76 17) 

where C is given by (7 4 13) This choice for $2 has the effect of making 
the Iteration problem invariant under scale changes m the parameters [9] 
For this choice of $2, the following expression for A has been suggested by 
Davies and Whitting [10] 


A‘*>» 


^<I) 


(7 618) 


which also goes to zero as the minimum is approached Using this 
expression and (7 6 17) gives what we call the scaled Levenberg method 
When Q is set equal to I and when A is given [10] by 

,,, 3e'■<*^VX'*'n„X^<*^Ve**’ 

^ en*i,vx<"X’''*'We<*> 

we term the associated procedure the modified Levenberg method 
Though the Levenberg method and ns modifications can remove insta 
bility and reduce oscillations it also can increase considerably the number 
of Iterations in a given case 


1.63 Marquardt's Method 

A method similar to Levenberg’s is the well known method due to 
Marquardt [4] This method uses (7 6 13) with 12 given by J2„ which is 
defined by (7 6 17) but Marquardt uses a different choice for A than does 
Levenberg Again if Afl„ is large compared to X^WX, the parameter 
correction is in the same direction as given by steepest descent which does 
not require that ]X^WX|^0 It is for this reason that the Levenberg and 
Marquardt methods are helpful when X^WX is poorly conditioned at the 
starting parameter vector but is better conditioned in the neighborhood of 
the least squares solution (GaUant [llj) Both methods also provide a 
compromise between the steepest descent and Gauss methods with the 
initial Iterations close to the steepest descent method and the final itera- 
tions close to the Gauss method 



7.6 MODIFICATIONS OF GAUSS METHOD 


371 


Marquardt proposed what Box and Kanemasu call the (A, v) algorithm in 
which A*^^ is calculated from 

A^*>= ^ (7.6.20) 

V 

The initial value of AS2, corresponding to the first iteration, is then 
XU=X(fi^/ v where p is some constant greater than unity. This method 
supposedly possesses the virtues of the steepest descent and Gauss 
methods where each is most effective. Marquardt’s recommendations have 
been followed by many and have been incorporated in numerous com- 
puter programs. 

Though the Marquardt method has been widely used since the publica- 
tion of Marquardt’s paper in 1963, there are still some unresolved ques- 
tions regarding the effectiveness of this method compared to others. This is 
discussed further in the next section. Moreover, Box and Kanemasu [9] 
have presented an analysis that indicates there is no need to compromise 
the direction of a step to be between those given by steepest descent and 
Gauss methods. They demonstrate this by showing that the steepest 
descent and Gauss vectors have the same direction if the parameters are 
transformed into a linearly invariant metric. They also showed that the 
constrained minimization in this metric is merely equivalent to using a 
modification employing (7.6. 1,2). Nevertheless, Gallant’s observation given 
above regarding the value of the Marquardt method when the initial 
X WX matrix is poorly conditioned is valid. 

7.6.4 Comparison of Methods 

In addition to the modifications of the Gauss method given above, 
doubtless many more will be suggested in the future. Moreover, many 
quite different approaches are presently available. These include deriv- 
ative-free methods [5], quasi-linearization [12], stochastic approximation 
[13], and invariant embedding [13]. For our purposes, however, some 
interpolation scheme such as that of Section 7.6.1 combined with a 
sequential procedure (such as given in Section 7.8) is usually adequate, 

particularly if the experiment is carefully designed as discussed in Chapter 

0 . 

The purpose of this section is to provide comparisons of several 
methods. Assuming that all the methods converge to the same parameter 
vector, one of the most important considerations in selection of a method 
e relative computer time needed. In many parameter estimation 
I'h'' model is a set of ordinary or partial differential equations 

In require time-consuming finite-difference methods for solution, 

c cases the time used in repeatedly solving the model is much greater 



372 


CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS 


than that used in performing the other requisite calculations in parameter 
estimation Hence the number of evaluations of the mode! gives an 
approximate relative measure of the computer time required for a particu- 
lar method 

Another criterion for comparison of methods is the power to solve 
difficult cases Unfortunately many of the more powerful procedures can 
take considerably more computer time than a relatively simple method 
such as the Gauss method The relative amount of time depends on the 
problem Indeed there are cases for which the Gauss method does not 
converge Usually at the expense of more computer time and greater 
programming complexity, computer metnods can be evolved to treat more 
difficult cases Though there is an advantage in having more powerful 
computer programs available, such programs can never supplant careful 
design of experiments Thus our tendency is to recommend more efficient 
though less powerful methods not only to save computer time but also to 
encourage careful experiment design In a case that is poorly designed not 
only IS the minimum S' difficult to locate but the associated parameter 
values are probably very sensitive to the measurement errors Good de 
signs yield more accurate parameter estimates and require less computer 
lime 

One way to compare methods is to apply them to a large variety of 
problems and to also vary the initial parameter estimates Some authors 
have done this Usually the cases are not randomly selected Rather there 
IS a tendency to use as test cases those that have .S functions which are 
difficult to minimize One such case is given next 


7.6 4 1 Box-Kanemasu Example 
The model and data investigated by Box and Kanemasu [9] are 
’7 = /3,/3j{,(I-F^,?, + 5000€j) ‘ 


(7 6 21) 


I 


1 


The independent variables arc and The errors in Y are assumed to be 
additive, to have zero mean, to have constant variance and to be indepen 
dent and normal, or m symbols Mill II Ordinary least squares and 
maximum likelihood give the same parameter estimates in this case 
The sensitivity coefficients can be given by 



(76 22 ) 



7.fi MODIFICATIONS OF GAUSS METHOD 


373 


where is used rather than Xj for its greater convenience. From (7.6.21) 
we can observe that tj is linear in P 2 nonlinear in yS,. Also the 
sensitivity coefficients as expressed by (7.6.22) can be plotted or tabulated 
versus for equal to 1 and 2; see Table 7.5. The initial y8, and 1^2 
values chosen were 300 and 6, respectively, and the converged values are 
716.955 and 0.944469. Then in the vicinity of the minimum, the maximum 
value of yS,^, would be about 1500. Notice that in Table 7.5 the X, and X 2 
sensitivities for less than 1000 are approximately proportional at both 
^2=1 and 2. This means that the minimum of S is probably ill-defined. 
This is demonstrated by the long narrow valley shown in Fig. 7.11a; such 
cases might pose difficulty in convergence for the Gauss method. 


Table 7.5 Table of Sensitivity Coefficients for 


i\ Model Given by (7.6.21) 





^2 = 2 



^2 

2?, 

A'2 

0 

0 

0 

0 

0 

250 

.0453 

.0476 

.0228 

.0244 

500 

.0826 

.0909 

.0453 

.0476 

1000 

.1389 

.1667 

.0826 

.0909 

2000 

.2041 

.2857 

.1389 

.1667 

3000 

.2344 

.3750 

.1775 

.2308 

100,000 

.0453 

.9524 

.0827 

.9091 

00 

0 

I 

0 

1 


Another indication that the minimum S is poorly defined can be 
obtained by examining the (X^X)~‘ matrix for the final b’s; it is 


C = (X^X)~' = 


2.33096X10^ 

symmetric 


-2.57548X10® 

2.79758X10^ 


The correlation coefficient of the estimators is approximated by 
^12/(C,,C22)'/2= -0.997909. Since the absolute value of this number is 
'’ery near unity, the two parameters are shown to be highly correlated. 

^'Sure 7.11b shows a comparison of results given by Box and Kanemasu 
I ] for the Marquardt method described in Section 7.6.3, the modified 
^uss method involving halving and doubling mentioned in Section 7.6. 1 
th ®ox~Kanemasu method of Section 7.6.1. The number of times, nj, 
at the function ry had to be evaluated is plotted versus Aq, the initial value 



374 


CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS 



(«> 

Figure 1\\» Sum of squares contours for the eaampie of Section 764] (Repnnied by 
permission of Prof G E B Boa) 

of \ chosen in the Marquardt method (Note that the number of iterations 
IS not equivalent to the number of function evaluations ) 

The following conclusions can be drawn from Fig 7 1 lb 

1 The Box-Kanemasu method is supenor to any of those investigated 
2. The next best method is usually the halving-doubling method 
y The Marquardt method is usually the slowest of those considered 

These conclusions should not be overgeneralized because only one particu- 
lar example was considered From theoretical considerations and from this 
example, however. Box and Kanemasu found that Marquardt’s method 



7.6 MODIFICATIONS OF GAUSS METHOD 


375 



Figure 7.11b Comparison of the Marquardt, halving and doubling, and Box-Kanesmasu 
modifications of the Gauss method. (Reprinted by permission of Prof. G. E. P. Box.) 

was not superior to the Gauss method with the modifications using 
(7.6.1, 2), with h replaced by 1/(1 +A), X being the value that Marquardt 
would recommend. 

7.6.4,2, Bard Comparisons 

Y- Bard [1] gave an excellent survey of 13 best known gradient methods 
including several modifications of the Gauss method. Bard found that 
several modifications of the Gauss method were better than the Marquardt 
^ethod. In his book, Bard [5, p. Ill] appears to favor a modification of the 

ox-Kanemasu method which he calls the interpolation-extrapolation 
method. 






376 


CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS 


7.6.4.3. Davies and JUtitlmg Compamoit 
In the paper by Davies and Whitting [10] a companson is given of five 
methods for seven problems considered by Jones [14] The methods in- 
clude the Levenberg procedures using X given by (7 6 16) and (7 6 18) 
There was negligible difference in the rate of convergence between the two 
methods They also used the modified Levenberg method which used 
(7 6 19) for fi = I and another equation for X when f2»S2„. again negligible 
difference were observed tn the rate of convergence Another method 
(SPIRAL) IS due to Jones [14], who estimates the Levenberg parameter X 
in another manner The first six columns of Table 7 6 contain the results 
given by Davies and Whitting [10] We have added the last two columns 
which were obtained using the Box Kanemasu method (a— 1, always) and 
the modified version which reduces a if the condition given by (7 6 1 1) is 
not satisfied For both methods sequential estimation was used with the 
initial P matrix being the large value of lO’l (or equivalently the small U of 
10"’l), this reduces difficulties when X^X is singular 

For all problems except Problem 5 the results of Table 7 6 indicate that 
the Gauss method is greatly superior to the Marquardt SPIRAL. Leven 
berg, and Modified Levenberg methods A weakness of the Gauss method 
IS indicated by its inability to treat Problem S. this was caused by the X^X 
matrix being initially singular The Gauss method works for Problem 4 
which has the same S function but different initia] parameter estimates 

The unmodified and modified Box-Kanemasu methods (with U“ 10 ’I) 
compare very well with the Gauss method (based on the number of 
Iterations m Table 7 6) for Problems 3, 4, 6 and 7 and arc superior for 
Problem 5 Only one iteration is needed lor Problems 6 and 7 which has 
the linear model of = + Hence these methods have 

the very desirable charactenslic of not requinng extensive iterations for 
linear problems, notice, in contrast, that the other techniques necessitated 
from 9 to 78 iterations 

For Problem 4 the methods using the Box-Kanemasu equation are 
about equal to the Gauss method but much better for Problem 5 all the 
other methods are much less effective for Problems 4 and 5 

It IS surpnsing that for Problems 1 and 2 the Gauss method converges in 
two iterations whereas all the others take many more For these two 
problems the S function is 

The two problems have different initial parameter estimates, problem 1 
starting with /3, « — 12 and ft=l Regardless of the initial parameter 



*3 

a 

E 

'x o 

© 

k— I 

S* w) 

< -B 
2 2 
S ^ 

g. g 

Cm •?> 

c « 

2 Q 
tZ! ^ 
'U ^ 

s 2 
•o .s: 

© M 

o © 
^ ff 

© c 

u ^ 

*3 Z 
cr p 

S tn 

s ^ 

.2 H 

© o 

•w *3 

im 

o > 

^ s 
© .s 

It 


v© 

t> 

z 

© 

H 


I. 


CS 

>i; £ 

o © 

® £ 

<5 


O 


^ u. 
© (U 

<>kH V© 

1 i 

^ © 


© 

c 

© 

> 

© 


cS 

3 

O' 

t-i 

c3 

S 


S - 

© X) 

3 
o 

0^ Z 


g O rf 
<N 2 ^ ^ 


ON 0\ ^ m 
ON ^ ^ ^ 


<N (N ON fO I ^ ' 


O O O O O 00 

o o Os o o NO r^ 


^ ON oo NO 
NO »n Tf tt ^ 


r-- ON NO NO ON 
^ <N CO NO 


<N<NON0OCO — — 
ON r^ On O NO <N 


— <N CO TT W-) NO 


-u 

© 

a. 

ex 

o 


cS 

3 

O 


3 

ex 

S 

o 

© 

T3 

C 

3 

tZ) 

C 

o 


o 

o 


T3 

© 

.£ 

S 

3 

O 

3 

(/) 

3 

© 

© 

3 


> 

3 

O 

U 


377 



378 


CIUPTER 7 MINIMIZATION OF SUM OF SQUARES FWCTIONS 


values, the Gauss method results m the correct and values in exactly 
two Iterations The first iteration results in the correct value of jGj but 
usually with being much larger than In the second iteration in the 
Gauss method S is reduced to zero From this example (as m others) we 
sec that being larger than S*® does not necessarily mean that the 
Gauss method will encounter difficulty m convergence In fact, m this 
problem the other methods were much less efficient 

In conclusion, although the Gauss method is very competitive in many 
problems, it may not work because X^X is temporarily singular For that 
reason we suggest that the U be made equal to Cl where C is a very small 
value compared to the diagonal terms of X^X (or that P^ = A’I be used 
with large K m the sequential method as discussed in Section 6 7) In 
addition the Box-Kanemasu method is recommended m order to be more 
certain of finding the parameter values minimizing S, even though m some 
cases the Gauss method would be more efficient In cases when the 
Box-Kanemasu method docs not converge then the modification of it that 
we have suggested is recommended provided one is convinced that a 
unique minimum poini exists 

7 7 MODEL BUILDING AND CONFIDENCE REGIONS 

Most of the developments discussed m this section utilize an assumption of 
a multivariate normal distnbution of errors In particular this assumption 
IS used in determining confidence intervals and regions for parameters, 
locating confidence intervals for Y, and applying the F test for model 
building The assumptions used in this section include additive, zero mean, 
normal errors in Y The independent variables are assumed to be errorless 
The assumptions are designated II III 

Unlike linear estimation the below expressions are all approximate, with 
the approximation being belter for cases which are less nonlinear than 
others For a measure of the nonlineanty, see Beale [15] and Guttman and 
Meeter [16] The expressions may be approximations due to (1) the 
sensitiv.ty coefficients being functions of estimated parameters, (2) linear 
approximations to derive equations such as (7 7 1), and (3) the usual 
estimates for 

7.7.1 Approximate Covanance Matnx of Parameters 

The approximate covanance matnx of the parameters has different forms 
depending upon the method of estimation In general, the expressions are 
similar to comparable linear estimation cases 


7.7 MODEL BUILDING AND CONFIDENCE REGIONS 


379 


For ordinary least squares estimation with the assumptions denoted 
11 — 11, the approximate covariance matrix of bLs is analogous to (6.2.11), 

cov(bLs) « (X^X) " *X^,/.X(X^X) “ ' = Pls (7.7.1) 

where X is the sensitivity matrix which is a function of bLs and where 
i|/=£(£e^). [Recall that in OLS estimation using (7.4.6) we set W=I and 
U=0.] With the additional standard assumptions of uncorrelated and 
constant variance measurement errors and being unknown (1111-011), 
the estimated covariance of b^g simplifies to 

(Y-Y)^(Y-Y) 

cov(b^,)«(X^X) .r^; — (7.7.2) 

When maximum likelihood estimation is used and the measurement 
errors are normal, the approximate covariance matrix of b^^ 

cov(bML)«(X^>^ - *x) ' ' = Pml (7.7.3) 

The assumptions are ll-l-Il (see Section 6.1.5). If il> is known to within 
the multiplicative constant, a^, that is, (1 1-101 1), then 

cov(bML)«(X^S2-'X)"'s2 
where s^ can be estimated using 

, (Y-Y)"'n-'(Y-Y) 

S pa 

n-p 

7.7.2 Approximate Correlation Matrix 

The approximate OLS and ML correlation matrices can be obtained using 
the above equations. The ij element of the correlation matrix is given by 

r, = P,(P,P^.)“'^' (7.7.6) 

where P.. is a term of (7.7.1), (7.7.2), or (J.13) depending on the case. The 
'agonal elements of r are all unity and the off-diagonal element must be 
the interval [ — 1,1], if ^ is known only to within a multiplicative 
the value need not be known for r since it cancels in (7.7.6). 
henever all the off-diagonal elements exceed 0.9 in magnitude, the 
estimates are highly correlated and tend to be inaccurate. Bacon [17] has 
suggested that when this is true, a simpler model form than the one 


(7.7.4) 

(7.7.5) 



3S0 


CIUPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS 


originally proposed may be appropnate A poor experimental design may 
also be responsible for the high correlations In that case it is recom 
mended that the sensitivity coefficients be examined and that the experi 
mental design strategies discussed in Chapter 8 be employed Box (18 19] 
has shown however that high correlations among the parameters can be 
due to a large extent to the nature of the model itself and thus no 
experimental design could be expected to yield uncorrelated parameter 
estimates See Problem 7 18 

A rule of thumb for anticipating the inaccuracy in calculation is given 
by Gallant [11] he suggested that difficulty in computation may be 
encountered when the common logarithm of the ratio of the largest to 
smallest eigenvalues of r exceeds one half the number of significant deci 
mal digits used by the computer See also Appendix A 4 

Example 7 71 

Use the G&lUnt criterion Cot the Box Kanetnasu example of Section 7 64 1 
Solution 

Fot that example r can be writien 

'"jwherer,,- 0 997909 

Using (^) and (r) of Example 68 I gives X,-l r,j and Xj« 1 + r,j The Gallant 
entenon then is Iogl(l r, 2 )/(J +/-,iH-2 98 Hence m order to solve the 
Box-Kanemasu data the Gallani entenon indicates that at least six significant 
figures be used m (he calculation 

7 Approximate Variance of Y 

Approximate variances for the predicted values of the model Y for OLS 
and ML are given by expressions sinular to (6 2 12) and (6 5 6) 

7 74 Approximate Confidence Intervals and Regions 

As frequently happens for nonlinear problems there are several ap 
proaches for providing approximate confidence intervals and regions 
Some of these are dtseussed betow 

Consider first calculation of approximate confidence intervals Use the 
linear approximation given by the Taylor senes of (7 2 1) (which is used in 
the Gauss equation) then analogous to (6 8 4) we have the approximate 



7.7 MODEL BUILDING AND CONFIDENCE REGIONS 


381 


100(1 - a) % confidence interval 

6^±est. s.e.(6^.)/,_„/2(i'); k=\,2„...,p (7.7.7) 

for any one of the parameters where 

= (7-7.8) 

and u is the number of degrees of freedom, frequently n—p, associated 
with the estimate of a^. The term ^kk is the kih diagonal term of P and 
represents the variance of b^. For the assumptions indicated, (7.7.2) is used 
to find Pi^^. for OLS and (1.1 A) for ML estimation. The expression given by 
(7.7.8) is approximate not only because is used for in (7.7.2) and 
(7.7.4), but also because the sensitivity coefficients are approximate since 
they are functions of b. 

Example 7.7.2 

The 90% confidence interval is to be found for each of the parameters of the model 
and for the data 

T) = /3iexp(^2') Y ^ 2 3 

Assume that the standard assumptions designated by 11111011 are valid. The 
minimum sum of squares is 0.5 and occurs at hi= 1.5 and A 2 =ln2 = 0.693. 

Solution 

The sensitivities, sensitivity matrix, and X^X are 

X,, = exp ( P20> 1^,2 = 13) hexp ( M 

xnb)-[; ; ^]. x-x.[‘ 

Owing to the standard assumptions, OLS and ML give the same estimates and 
Approximate covariance matrix of the parameters. Using (7.7.2) gives 

i -1 

. 1 9 —6 1 0-5 _ 4 6 

~ 54-36 [-6 6 J 1 ~ i 

[6 6 

From / tables we get t_ 95 (l) = 6.31 for 90% confidence (see Table 2.15). Then, for 



382 CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS 

90% confidence intervals (7 77) and (778) yield 

bi * (0 25)''*(6 31)= I 5±3 16 or - 1 66<b, <4 66 
62±(I/6)'^’(6 30=0693*2 58 or-1 89<jS2<3 27 

Generally confidence iniervah provide poor approximations to the confi 
dence region even for linear cases This is even more true for this nonlinear 
case, as is shown below In addition to the approximations that are present 
m this example, the reader should be aware that this example was con 
structed for pedagogical simpliaty and not because it is a realistic exam* 
pie We can expect accurate results from the equations given above for real 
cases only when the assumptions are nearly valid and when the number of 
observations n is large 

Gallant has given a Monte Carlo study |II) for a certain nonlinear 
model with four parameters that showed small differences using the above 
method for determining the confidence intervals compared to a more exact 
method (which involves the likelihood ratio) He indicated, however, that 
when P IS an ill conditioned matnx the likelihood ratio method can yield 
considerably shorter confidence intervals than those using the above 
method, which is based on the asymptotic normality of the least squares 
estimator We will describe this more powerful (but m application more 
time consuming) procedure in relation lo the confidence regions 

Let us now consider two methods of finding confidence regions Again a 
number of standard assumptions are needed including knowledge of the 
probability density which we shall assume to be normal Assume that the 
conditions denoted 11- 1011 are valid Then using a Taylor series ap- 
proximation for t; the 100(1 — a)% confidence region for both OLS and ML 
estimation is given by 

'<b-p)=ps^f, „(pn-p) (779) 

where for OLS and ML estimation (P*) ‘ is 

(P’ls) '«sX^X(X'‘aX) 'X^X (7 7 10a) 


In (7 7 10a) the sensitivity matnees are functions of bjj whereas those in 
(7 7 10b) depend upon b^i. In giving (7 7 9, 1(1) results of Section 6 8 3 are 



7.7 MODEL BUILDING AND CONFIDENCE REGIONS 


383 


used along with (6.2.11) and (6.5.5). The approximate confidence contours 
given by (7.7.9) are ellipsoids. 

A more natural confidence region is found using a likelihood ratio [20]. 
It is the nonlinear analogue to the F statistic given by (6.2.24) 


[^(^)-7?]/p 

R/{n-p) 


^F^_^{p,n-p) 


(7.7.11) 


where p is present instead of q because a confidence region is needed for 
all the parameters. {R is the minimum S.) Usually (7.7.11) does not 
produce ellipsoids in parameter space. The contours are along the lines of 
constant likelihood ratio but the confidence level 1 — a is approximate 
[15,16]. The approximation enters in (7.7.11) because the numerator and 
denominator no longer contain independent distributions. If we are 
dealing with a model which is nonlinear in the parameters, computation of 
confidence contours using (7.7.11) is more difficult and time-consuming 
than that using (7.7.9) in which a linear approximation is incorporated. For 
example, if there are two parameters, a separate numerical search problem 
is involved to obtain for each choice of Pj- 

Example 7.7.3 

The 50, 75, and 90% confidence regions are to be found for jS, and P 2 for the model 
and data of Example 7.7.2. Use both methods given above. 


Solution 

Method Based on (7.7.9) 

For this method (P*ls)~ ' is since S2 = I for the assumption given. The value of 

is f?/(n— p) = 0.5/(3 — 2) = 0.5. Then using the X^X matrix of Example 7.7.2 in 
(7.7.9) gives 


[h] b2~ ^2^ 


6 6 

bt-Pt' 

.6 9. 

62 “ Pi 


= 2(0.5)F,_„(2,1) 


(n) 


in ing the confidence region using this expression is readily accomplished using 
a results of Example 6.8.1. Note that C of that example is the square matrix in (a) 
given above. From (d) and (e) of Example 6.8.1 the eigenvalues are found to be 
1 - U5 - (153)'/2]/2 = 1.31534 X, = [15 + 053)'/^]/2 = 13.68466. Then using 

(g)-(7) of Example (6.8.1) gives 


-0.788205 0.615412 

0.615412 0.788205 



384 


CHAFTIB 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS 


Since e is orthonormal and since I, we can write analogous to (/) and (m) 

(6i- 01 ±0 68726F»‘£U2.1) 

(h - Pi U = - (2, TO 53660f (2. 1) 

(^>1 -01 ^0 16636F,'^i (2, 1) 

(*3 - 0J )m . =■ i (2. 1)^1 1^2 " “ +0 2 1 307F,'Zi (2. 1 ) 

The three confidence regions can be found by using these equations with the F 
values which are Fjo(2,n = I 5. FtjC. 1)=75 and F^2. J)-49 5 They are shown 

in Fig 7 12 as dashed ellipses that are centered at /})= 1 5 and Notice 

that the 90% confidence region extends as far as (*i“0i)„,j'= i 68726(49 5)’^^= 
*484 and (62-0i)„.2= 5:3 78 Since 4 84/1 5>=3 22 and 3 78/ 693=5 45 the 
confidence region is relatively large and the parameters must be considered 



Figure 7 12 Likelihood ratio and approximate confidence contours for 50 75. and 90% S is 
given m Example 7 7 3 as equauon (6) 





7.7 MODEL BUILDING AND CONFIDENCE REGIONS 


385 


inaccurate or ill-determined. If the confidence region were sufficiently small so that 

and 

we would say that the parameters are well-determined. 

From the above equations the ratio of the major to minor axes of the ellipses is 
found to be (A2/^i)''^^ = 3-23 which indicates that the ellipses are neither extremely 
narrow nor approach a circle. The major axis forms an angle of tan“'(e]2/e22) = 
-37.98° with the T’j — /Sj axis. Along this angle the S function varies most slowly. 
Another way of thinking about the major axis is to note that it can be described by 


or 


Si 82 = 


^^js)9,= -0.7808S/8, 


Sy32 + 0.78083)3, = 0 


this means that we can estimate the sum )S2 + 0.78;S, more accurately than ^82 or yS,. 

One should distinquish between the ill-determined condition provided by the 
confidence region and difficulty in convergence indicated by high parameter 
correlations or equivalently by a large value of the Gallant criterion. Difficult 
convergence is related to the character of X^X (and thus near linearity of sensitiv- 
ity coefficients), but not to the accuracy of the measurements (the value). The 
confidence region depends on both. For related discussion, see reference 21. 


Method Based on (7. 7.11) 

For this equation the S function is needed; it is 


5 = (l-; 8 ,)' + (2-,8,)"-)-(3-/3,e/’=)' (b) 

Because of the relative simplicity of this function, we can solve for exp(/?2) in terms 
of P] and S', then taking the natural logarithm of exp(,62) gives for /S2 


/32 = ln 


3±(S-5 + 6)8,-2,Sf ) 


(c) 


In more realistic examples a search for /?2 would be needed for fixed /S, and S. 
8ince 71 = 0.5, n = 3, andp = 2, (7.7.11) becomes 


S = F,_„(2,l)-l-0.5 (d) 

By introducing this value of S into (c) and solving for ,82 for a range of JB, values, 
\j IT™ oonfidence regions shown as solid lines in Fig. 7.12. 
n 1 e the dashed confidence regions which are all ellipses with the same ratio of 
^ njor to minor diameters, these are quite different in shape particularly as S 
comes large, or equivalently, as the percent confidence approaches 100. It is 



386 


CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS 


significant however that at the smallest S value (corresponding to 50%) the true 
shape of the confidence region does approach the same ellipses given by (7 7 9) 
From Fig 7 12 note that a likelihood confidcDce region tends to be larger than the 
corresponding approximate region a likelihood ratio conlout had been ms de the 
associated approximate contour then the latter would be suggested as a conserva 
tive estimate of the I kelihood ratio confidence region Though this condition is not 
satisfied m this case it may be m some cases See Bard JS p 209J 

For better designed expenments and for more observations (usually IIS'S) the 
two methods show much better agreement For an example see the Box and 
Hunter paper [22] 

It IS instructive to make a comparison of a confidence region constructed using 
the confidence intervals given in Example 77 2 with the confidence regions shown 
in Fig 7 12 Consider the 90% case which covers the region - I 66< ^i<4 66 and 
- 1 89 < /3j < 3 27 This region contains some of the 90% confidence regions and 
additional areas but also there are areas in the more correct confidence regions not 
covered This indicates that a confidence region constructed from confidence 
intervals may not be a good approximation of the approximate regions a poor 
approximation to the likelihood ratio confidence region is provided particularly as 
the percent confidence approaches 100 

7 7^ Model Building Using tb« F (esi 
The F statistic given by (6 2 24) 

and used for confidence regions can also be used for nonlinear model 
building As pointed out below (7 7 11) thts expression is approximate for 
nonlinear models The assumptions denoted 11 1011 are used in connec 
tion with (7 7 12) Recall that (7 7 12) implies two groups of parameters 0, 
IS assumed needed and may or may not be needed There are p 
parameters in all with p, havingp-^ and Pj having q The objective is to 
see if the parameters should be included m the model p*2 may contain 
values suggested by a competing model (See discussion following Theorem 
6 2 4) When all p parameters are simultaneously estimated the minimum 
sum of squares is designated bj) if the standard assumptions of 
11111011 are valid OLS can be used but if the errors neither have a 
constant variance nor arc independent (11001011) then maximum likeli 
■hood estimation s'hou'ld'be used 

The statistic F (J1 12) is utilized by comparing its value with ^{q n 
-p) \l F > Fx _a(^ n -p) then we have an indication that the parameters 
Pj are needed in the mode) and should not be the values implied by P*2 



7.8 SEQUENTIAL ESTIMATION FOR MULTIRESPONSE DATA 


387 


Example 7.7.4 

Using results of the estimates of parameters for the cooling billet problem, 
determine the 95% confidence model selecting from those given by (7.5.11,12). 
Assume that the standard assumptions indicated by 1 1 H 101 1 are valid. 

Solution 

The f test can be used in this example in a manner that is very similar to that used 
for linear problems. Using the sums of squares given in Table 7.4 we can construct 
Table 7.7. The F values are compared with F g^{\,n — p) = A.SA, 4.60, and 4.67 for 
n-p equal to 15, 14, and 13, respectively. Since 0.0077 is less than 4.67 but 672.8 is 
not less than 4,6, we have an indication that the two parameters /3i and (32 in 
(7.5.11) are needed but the ySj parameter is not. 


Table 7.7 Sum of Squares and F Statistic for Example 1.1 A 


Model 

No. of 

Degrees of 


Mean Square, 



No. 

Parameters 

Freedom 

R 

s^=R/{n-p) 

AR 

F 

1 

1 

15 

38.73731 

2.582 



2 

2 

14 

0.7896890 

0.05641 

37.9476 

672.8 

3 

3 

13 

0.7892203 

0.06071 

0.000469 

0.0077 


7.8 SEQUENTIAL ESTIMATION FOR MULTIRESPONSE DATA 

There are a number of advantages of sequential estimation including the 
development of simpler, yet more general computer programs and the 
possibility of providing more insight into model building. For other 
advantages, see Section 6.7.7. 

The process is considered to be sequential with time because dynamic 
experiments are so common. However, there may be cases when one might 
Wish to have some other independent variable, such as position, to be the 
sequential variable. At each time the experiment is considered to involve 
»i(> 1) different responses. Them different responses could include differ- 
ent dependent variables such as temperature, concentration, velocity, or 
0 age; they could also represent measurements from m similar sensors, 
wo sequential methods are given in this section. Both utilize the Gauss 
incorporate the Box-Kanemasu modification, 
w IH ^^quential method is simply a direct adaptation of (7.4.6). It 
the'* ^^^onimended when a sequential method is being used to reduce 

of requirements in a computer program and the sequential values 

e parameters are not of interest. 



338 CHAPTER 1 MINIMIXATION OP SUM OF SQUARES FUNCTIONS 

The second method is similar to the linear sequential method of Section 
67 The measuiemcnts must either be actually independent or can be 
treated as if they are The latter situation is the case, for example, when 
OLS IS used Moreover, autoregressive and moving average correlated 
enors can be analyzed by using certain differences which are unconelated 
Provided the errors are or can be treated as being uncorrelated, this 
method is recommended when ni<p because the order of the matrix 
inversion is reduced Only a scalu needs to be inverted if m = I Advan- 
tage can be taken of this fact by renumbenng the observations as if they 
were all at different times even though physically m > 1 

Prior luformation can be included through the use of ft and U 

781 Assumptions 

The standard assumptions of additive, zero mean, and noncorrelatcd errors 
are used The errors have a known covariance matrix The independent 
variables are considered to be nonstochastic The parameters are assumed 
to be constant that is not time variable or random These assumptions are 
designated ll-l-Il- The assumptions regarding the measurement errors 
can be written as 


Y-E(Y|P)-K-ij + e (78 1) 

£<()-0 (7 8 2) 

(«)**(*’)]“<'/( •') for K « 0 , y ■ A: 

=0 otherwise (7 8 3) 

The measurement vector Y is composed of the m dimensional vectors 
Y(0, 



Y(l) 




Y(2) 


n(0 

Y= 


whereY(i)= 



\{n) 




The associated error vector e and dependent variable vector ij are similarly 
composed of the m-dimensional vectors e(i) and ij(/) where i= 1„ ,n 
The covanance matrix of the errors for the given assumptions is the 



7.8 SEQUENTIAL ESTIMATION FOR MULTIRESPONSE DATA 


389 


diagonal matrix 4', 

;//=diag[0(l)...^>(«)] where 0(i) = diag[a^(/)...a^(/)] (7.8.5) 

Consistent with the above assumptions is estimation using the assump- 
tions 11 — Oil which are the assumptions used in Gauss-Markov estima- 
tion for linear models. With the additional assumption of normal errors, 
the results are equivalent to those obtained using maximum likelihood. The 
sequential procedure given below could also be used with OLS estimation 
by simply replacing O by I. 

7.8.2 Direct Method 

Equations (7.4.6) apply to the multiresponse case. In order to reduce the 
plethora of subscripts and superscripts we write (7.4.6) as 

b* = b-t-P[X^4>-'e-t-U(ia-b)] (7.8.6) 

e=Y-ri, P=[X^a>-'X-hU]~' (7.8.7) 

where X and ij are evaluated at b which stands for b^*\ The vector b* is the 
same as in (7.4.6a). Recall that k is an iteration index which changes 
only after all the data (mXn observations) have been considered. 

The sensitivity matrix in (7.8.6, 7) can be partitioned so that 



’X(I)' 


’^m(0 

^12 (0 


x= 

X(2) 

where X(/) = 

X2 ,(/) 

'^22 (0 

X2,(0 


X(n) 



X „ 2(0 



(7.8.8) 

Notice that is the sensitivity coefficient for the yth dependent 

variable and klh parameter and at the /th time, 

H(') 

( 7 - 8 - 9 ) 

A typical term of the triple product X^O“'X of (7.8.6) can be indicated 



390 


aiAPTER 7 MINTMIZATIOV OF SUM OF SQUARES FUNCTIONS 


by 

X>-'X-|2 f J0»(,)A^(/)o/=(/)j, (7 8 10) 

Rather than waiting to evaluate the sums in X^4>~'X after a complete 
Iteration, a running summation can be performed to reduce the computer 
memory requirements Let us define C^{i) as being 

( 7811 ) 


Then the si term of the P ' mainx can be constructed from 


Q(i4-l)=C„(i)+ 2 + ^t + 1) (7812) 

which forms a recursion relation with the starting value of 
Both s and / go from I to p 

A similar recursion relation can be given for the components of the 
vector, X^4» ‘e one component is 

■i,(' + l)-=J,(')+ S + + (7813) 

which has the starting condition of 

■iCO)* f (75 M) 

The range of ; is 1 to /> 

The procedure to find the new parameter vector is first to use (7 8 12) 
and (7 8 13) Then the (/+ l)th matrix of P is found by finding the mverse. 


[c„ti+i> 

P(,+ l) = 

Finally the parameter vector is found using 

b»(i+l>=b+P(i+l)d(i + l) (7816) 

Note that b*(i + 1) is the parameter vector at the (r + l)th time m the ^th 
Iteration where k has the same meanmg as (7 4 6) The vector b in (7 8 16) 
IS p for the first iteration and b*(n) of the preceding iteration for the 
subsequent iterations on k 

If only the final converged parameter values are of interest, it is 
necessary to invert the P matrix just once for each complete iteration, m 


„(<+') 

,(•+') 


(7 8 15) 



7.8 SEQUENTIAL ESTIMATION FOR MULTIRESPONSE DATA- 


391 


Other words, it would be necessary to just find P(«) and then use 

b*(«) = b + P(«)r/(«) (7.8.17) 

At the end of an iteration the b vector is replaced by b*(n) and another 
iteration on k is begun with X being evaluated at the new b. The process is 
continued until convergence is attained. 


7.8.3 Sequential Method Using the Matrix Inversion Lemma 

The usually preferred sequential method involves the matrix inversion 
lemma. Using expressions in the preceding section we can write 

b*(/+ l)=b + P(/+ l)[d(0 + X^(/+ l)a>-'(z + l)e(z + 1)] (7.8.18) 


P(i + l)=[p-'(/) + X^(/+l)0-‘(/+l)X(/+l)] 
where typical terms are 

r ^ 

X^(/ + l)d)-'(/+l)e(/ + l)= 2 + 


(7.8.19) 

(7.8.20) 


m 

X^(/+l)4)->(/+l)X(z+l)= 2 A),(/+l)A);(/+l)a/2(/+i) (7.8.21) 


The range of s and / in these last two equations is 1 io p. 

Using (6.7.5a) and (6.7.5b) we can write analogous to (6.7.6a, b,c,f). 


(7.8.22a) 
(7.8.22b) 
(7.8.22c) 
(7.8.22d) 

Utilizing (6.7.5b) and (7.8.22c) in (7.8.18) yields after some rearrangement 
D. b*(/ + 1 ) = b*(/) + K(z + 1 ) (e( / + 1 ) - X(z + 1 ) [ b*(/) - b ] } 

(7.8.22e) 

pt i, the above four equations are to be used in the order A, B, 

’ and E, indicated on the left. 






391 


CHAPTER 7 MINUrfIZATION OF SUM OF SQUARES FUNCTIONS 


It IS impoitant to understand that three different parameter vectors 
appear m (7 8 22e) First, b*(i + l) is the estimated vector for the (i4- l)th 
time and at the l)th iteration Second, b*(0 is for the »th time and 
(A:+ l)th Iteration Finally, b is ibc vector found at the «th time of the A;th 
Iteration, the vector b does nor change during an iteration over the time 
index I Note that P(i+ 1), X(i+l),d(i+ 1) ande((+ I)'»Y(i + l)-i)(i + 1) 
are functions of b and not b*{/) 

When there is prior information, each iteration starts with b=/t and 
P(0) = Vg =U'' A detailed example is given in Section 79 1 

When there is no prior information a special starting procedure is 
needed at « = I for each of the k iterations As suggested in Section 6 7, we 
can select P(0) to be a diagonal matrix with relatively large terms Since P 
represents the approximate vanance-covanance matrix of the parameter 
vector, p a P(0) with la^c diagonal components indicates poor prior 
information In general, P(0) is fixed from iteration to iteration (le, over 
the k index) Provided no problems are encountered with round-off. P(0) 
could have any diagonal values that are large (by a factor of 10*, say) 
compaicd to the corresponding values of P(n) 

783.1 Sequential Method for I, p Arbitrary 

An important application of the previous analysis is for m- 1 because the 
only inverse that must be found (that of 4) becomes the inverse of a 
scalar As shown m Section 675 uncorrelated multiresponse measure 
mencs can be renumbered so that tbe m* I analysis can be employed 
From (7 8 22) evaluated for m*» 1 and p arbitrary we find 


(7 8 23a) 


(7 823b) 


(7 8 23c) 
(7 8 23d) 


(7 8 23e) 
(7 8230 


'l.o + 1 )- 2 •';('+ 1 2 .P 

i(i + i)-o'(i + ()+ + 

y-' 

+ «"I2 ,p 

b\ii + l) = b*^(.i) + K^(i + l)HO+iy. u=l ,/> 

+ + u.o=l. ,P 



7.9 EXAMPLES UTILIZING SEQUENTIAL ESTIMATION 


393 


The above scalar equations should be used in the order given. These 
equations can be programmed in a straightforward manner, but one must 
remember that three b vectors are involved and that the P(0) matrix does 
not vary from iteration to iteration. Only two sets of b storage locations are 
needed, however, one is for b = b*(/7) and the other is for b*(r) and b*(/ + 1) 
since the latter replaces the former as calculations proceed. Since m = l, 
only one subscript is needed for X^(/+ 1) and none is needed for 
When programming, the / + I index need not be carried. 

The sequential procedure given by (7.8.23) reduces to that given by the 
linear analysis given by (6.7.8) by setting b = 0. 

For MAP estimation the starting value of b is ju, and the starting P(0) 
matrix is U~' = V^ for each iteration. All the sensitivity coefficients for the 
first iteration (which includes /= 1,2,,...,/?) and each r)(/+ 1) is evaluated 
using b = |yi. For subsequent iterations X and rj are evaluated at b(n), the 
final vector of the previous iteration. See the Section 7.9.1 example. 

When there is no prior information, follow the procedure described 
below (7.8.22). 

7.8.4 Correlated Errors With Known Correlation Parameters 

If the measurement errors in Y are additive and correlated in time, the 
above analyses can be used with slight modifications for the autoregressive 
and moving average cases discussed in Section 6.9 and Appendix 6A, 
provided the correlation parameters are known. The analyses can be 
performed by replacing X by Z = D“'X and e by f = D"'e. For example, 
for first-order autoregressive errors, the terms in the matrices would be 

Zj, (/) = A), (/) - P,(/)A), (/ - 1 ) (7.8.24a) 

/,(/) = c,(/)-p,(/)e,(r-l) (7.8.24b) 

for /= 1 , 2 „...,« and where and ^(0) are defined to be equal to zero. 


7.9 EXAMPLES UTILIZING SEQUENTIAL ESTIMATION 

The purpose of this section is to present four different examples in which a 
sequential method is used. In the first example, detailed results are given to 
! the sequential MAP method based on the matrix inversion 

emma. The physical problem involves heat conduction in a finite plate. 

e secoiid problem, involving an ordinary differential equation, is the 
00 mg billet problem previously considered. The third problem uses 
u ated data for heat conduction in a semi-infinite body; the errors are 
O'e ated. The final example is a realistic example involving about 1000 



394 


CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS 


actual measurements: it is again for heat conduction in a finite plate 
Except for the second example each of the problems in this section 
involves multiresponse data 


7.9.1 Simple MAP Example Involving Mullircsponse Data 

This example provides a detailed analysis of a maximum a posteriori 
sequential estimation procedure for some multiresponse data The data are 
abstracted from the extensive values contained m Table 7 14 The physical 
problem is that of a flat plate healed on one side and insulated on the 
other More extensive use of these data ts discussed in Section 7 94 where 
a more complete description of the problem is given In that section the 
results of estimation for the thermal conductivity, k, and the specific 
heat-density product, c, are given, a finite difference procedure is used to 
calculate the dependent variable (temperature) In this section a limited 
amount of data is used, K^k ' and «(—*/<:) are estimated, an analyti- 
cal solution is used for the dependent variable, and subjective prior 
information is used 

The measurements for this case are given m Table 7 8 and other known 
information is given m Table 7 9 The temperature measurements in Table 
7 8 can be considered to be additive, zero mean, constant vanance, 
uncorrelated and normal Since at each time ihere are two measurements 
at each location, an estimate of pure error can be obtained Using the fact 
that y(ej-t 2 )^ 2 o^ (for assumptions llll — ), an estimate of o* from the 
data of Table 7 8 can be found to be about 0 0625 The errors in the time 
measurements are negligible compared to those in the temperature nse 
— that is, those above the miiiai temperature which is about 81 69'’F The 
assumptions given above and that of subjective prior information is 
denoted 11111113 

The temperatures corresponding to ij are calculated using the solution 
given by (8 5 25), which can also be differentiated with respect to A and a 
to get the sensitivity coeffiaents For the times given in Table 7 8, the first 
three terms at most are needed m the summation Values for the sensitivity 
coefficients of K and o, designated X, and ATj, respectively, are given in 

Table 7.8 Data for Section 7.9.1 Problem 
Tune Measured 

Time from Start of ri (‘F) Tif'F) F, (°F) TgCF) 

(sec) •Heatmgfnfj (x=l.) (jc ^Z. (x=0) (^=0 

15 3 12/3600 85 83 85 37 94 73 94 48 

183 15/3600 8746 87 16 9644 9603 


7.9 EXAMPLES UTILIZING SEQUENTIAL ESTIMATION 


395 


Table 7.9 Known Conditions for Section 7.9.1 Problem 

AT) = 45“' =0.022222 hr-ft-°F/Btu ^=9612 Btu/fF-hr 

Hj(=a) = 0.8ftVhr L= 1/12 ft 

K^„ = 4xlO-^(hr-ft-°F/Btu)2 ro=81.69°F 

1 ^/ 3 , 12 = 1 ^/ 3 , 21“9 
22 =0.01 ftVhr^ 


Table 7.10. The corresponding calculated values of temperature, denoted 
Tj, are also given. All the values of X^, Aj, and in Table 7.10 are based on 
the parameter values of A'=/t, and a = [i 2 . The information in Table 7.10 is 
given in an order appropriate for sequential estimation with a single 
observation being added at one “time.” Other orders are possible, however. 


Table 7.10 

Sensitivity Coefficients and for the First Iteration 
of the Section 7.9.1 Problem 

i 

Time 

(hr) 

;c 

T 

X\‘^ 

xp 

v(0 

1 

12/3600 

L 

T, 

177.8 

8.158 

85.640 

2 

12/3600 

L 

T, 

177.8 

8.158 

85.640 

3 

12/3600 

0 

Ts 

570.9 

8.930 

94.377 

4 

12/3600 

0 

Te 

570.9 

8.930 

94.377 

5 

15/3600 

L 

T, 

252.4 

10.49 

87.299 

6 

15/3600 

L 

T, 

252.4 

10.49 

87.299 

7 

15/3600 

0 

T, 

650.0 

10.87 

96.136 

8 

15/3600 

0 

Te 

650.0 

10.87 

96.136 


In Table 7.11 are the sequential values for the first iteration. These 
values use the sensitivity coefficients given in Table 7.10 (except to more 
significant figures). Notice that all the A^(/)’s and t)(/) are evaluated for the 
same values of K= gj and a = the sensitivity coefficients and rj in Table 
•II are not evaluated using the updated values of h,(/) and h 2 (/). The 
sequential procedure used is given by (7.8.23), which is identical to that 
given by (6.7.8) for the linear case. For the nonlinear case, however. 
Iteration is required. 

The starting values of b,, ^ 2 , F,,, Fjj, and F 22 loi" MAP analysis are 
b 2 > Pgii, Fg , 2 , and Fg, 22 > respectively. These starting values are used 
^orece/y iteration as can be observed from the starting conditions in Table 
’ "'Itich is for the second iteration. In this iteration, however, all the 



396 CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS 


Table 7.11 Sequential Estimation of K and a in Problem of Section 7.9.1, 
First Iteration. Based on Parameter Values A"* = 0 022222 
and a = /ij=*08 



(sec) 


T 

6,xl0^ 

h 

P„xlO* 

P,2X10* 

Pj^XlO* 

0 




2 2222 

08 

4 

0 

10000 

1 

12 

L 

T, 

22380 

081814 

3 408 

-67 89 

2211 

2 

12 

L 

T* 

2 2188 

079603 

3 386 

-70 47 

1915 

3 

12 

0 

Tj 

23074 

078031 

0 5543 

-20 23 

1024 

4 

12 

0 

Tg 

2 2836 

078452 

03988 

-17 48 

975 0 

5 

15 

L 


2 2701 

079407 

03321 

-12 79 

645 0 

6 

IS 

L 

r, 

22815 

078603 

0 3030 

-10 74 

5009 

7 

15 

0 

TV 

2 2873 

078492 

0 2372 

-948 

476 8 

8 

15 

0 

T's 

2 2654 

078912 

02064 

-8 89 

465 5 

Table 7.12 

Sequential Estimation of K and 

a in Problem of Section 7.9,1, 



Second Iteration. Based on Parameter Values K^b\'\n) 



-0 0226S4 and a- 

bi‘’(n)-078912 




Time 








1 

(sec) 

* 

T 

O.xitf 

' bi 

P.iXlO* 

P„xl0« 

?,jXio‘ 

0 




2 2222 

08 

4 

0 

10000 

1 

12 

L 

Ty 

2 2372 

081793 

3 446 

-6617 

2103 

2 

12 

L 

Tt 

22188 

079589 

3 425 

-68 63 

1809 

3 

12 

0 

Ti 

2 3077 

078075 

0 5606 

-19 87 

9796 

4 

U 

0 

Tg 

12838 

078481 

0 4039 

-n 20 

9342 

5 

15 

L 

Ty 

2 2701 

079421 

0 3354 

-12 53 

6156 

6 

15 

L 

T* 

2 2816 

078632 

0 3056 

-10 50 

476 6 

7 

15 

0 

Ti 

2 2874 

078525 

0 2393 

-928 

454 2 

8 

15 

0 

Tf, 

2 2654 

078930 

0 2083 

-871 

443 8 


values of A|(i), X 2 (i), and jj(i) arc found based on the values of the 
previous iteration, 022654 and b5‘'(«)=0 78912 

If this case had not been one involving prior information, then the initial 
values of and for the second iteration would be simply the 

final values of the first iteration at which values the sensitivity coefficients 
and 1 } are also evaluated The values would not be different, 

however (unless one wishes to use P^\0) to introduce the Levenberg or 
Marquardt modifications) 

The iterations in general continue until negligible changes in bj^\n) 
occur In this case only two iterations (k = l) apparently are needed 



IS EXAMPLES UTILIZING SEQUENTIAL ESTIMATION 


397 


7.9.2 Cooling Billet Problem 

The data for the cooling billet problem are given in Table 6.2; results of 
analyses of these data are given in Sections 6.2, 7.5.2, and 7.7.5. Though 
converged parameter values for a differential equation model are given in 
the last two sections, sequential results are not given. The purpose of this 
section is to demonstrate the power of a sequential procedure to provide 
insight for model building for the cooling billet case. 

Consider first results of the power series models for h given by (7.5.11). 
The values of for Model 1 are depicted in Fig. 7.13. (Recall that Model 
1 is hA / pcV — (iy) The value of &,(/) is the approximation to /S, using the 
data for times through the estimate is approximate in that the sensitivity 
coefficients were evaluated at the converged parameter values using all the 
data. Because the b^ values of Model 1 continually decrease in Fig. 7.13 
and because Z)[(/) is the average until the actual time variation of h is 
probably larger than that shown. 

Parameters for Model 2 {hA f pcV= + /Sjt) are shown in Figs. 7. 13 and 
7.14. The values shown in Fig. 7.13 are relatively constant with time; 
this is particularly true for the last five time steps. For the last half of the 
time steps shown in Fig. 7.14, is relatively constant. This constancy of 
parameters suggests adequacy of Model 2. 


T 1 r 

A 


2.95L 


A 

A + 4^ 


2.9 T- 


bi ihr" 


2.851- 


2.8 L 


2.75U 


• • 


2.7, 


T 1 1 r 

Model 3 


A + 
A- 

A + A 


+ + + 

A 


/: 

L — Model 2 


A 


Model 1 


_l I I L 


8 10 12 14 16 


Time step index, i 

Figure 7.13 Estimates of h\ for Models 1, 2, and 3 for the cooling billet data. 




398 


CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTlO SS 



4 6 8 10 33 14 16 

Time Step index i 

!s of fii for Models 2 and 3 for the cool ng b Det data 


For Model 3 (A^/pcK-ZJi + ^^f + ^j/^) the bi values are relatively 
stable in Fig 7 13 those of in Fig 7 14 less so and those m Fig 7 15 
showing bj are quite oscillatory The oscillatory nature of 63 passing 
through zero several times suggests that Model 3 is not appropriate 
Sometimes one wonders about the effects on the parameters of addi 
tional data Without obtaining the data one cannot be certain However 
the sequential method applied to existing data should give insight into the 
effects Additional data would be expected to continue trends already 
observed in the parameters For example Fig 7 15 suggests that bj might 
continue to oscillate about zero with decreasing amplitude 
The sequential results for and ^3 in Model 4 hA /pcy- /3, + 0^(7 
T^) are shown in Fig 7 16 The b, values are relaiively constant but the 6, 
values are not The 50% vanalion of m the last five steps suggests that 
the model should be further examined by using additional data or related 
models should be tried Physically the model is quite attractive but some 
improvement is needed such as adding an exponent n on T— this was 
tried but was unsuccessful as mentioned in Section 7 5 2 although addi 
tional data might be needed also to estimate « 


7921 Other Possible Models 

In addition to the models mentioned above many more could be sug 
gested For example one extension of Models 1 2 and 3 is to investigate 
other terms such as Another approach ts to always use the linear m t 
model (Model 2) but to use different 0 paire (such as 0^ 0^ for the /ih 
lime interval) for successive time intervals (23 25] 











400 


CHAPTER 7 MINWnZATION OF SUM OF SQUARES FUNCTIONS 


7.93 Semi-Infinite Body Heal Conduction Example 

Consider a semi-mfinile, heat-conducting body initially at the temperature 
Tq of lOO^F (311 K) and whose surface temperature takes a step increase 
to r„ = 200‘’F (366 K) The temperature is measured every 9 sec until 252 
sec at the two locations X|=0 125 m (000318 m) and Xj^OlS in (000635 
m) where x is measured from the heated surface This example is of the 
Monte Carlo type because the data are simulated using the known value of 
thermal diffusivity a, equal to 001 f^/hr (258x lO'^m^sec) and because 
15 different sets of random errors are considered The first five cases use 
independent, normal errors with unit variance The measurement errors for 
the first case are taken from the first two columns of Table XXIII of [26], 
the second case uses those from the next two columns etc 
Correlated errors are obtained for the first-order autoregressive process 
a 1 described in Section 6 92, the error t, at time t, is given by 


+ (^ 51 ) 

where the p-05 for cases 6-10 and p=09 for the remaining cases The u, 
random variable values are (he same values taken from [26] as mentioned 
above 

A mathematical statement of the physics of the problem is 


ar 

“ ax* * 


(7 9 2) 


r(0, f) » r«. Tix 0)“ To. r(«, O » Tq (79 3) 

which has the exact solution. 


T'cc-T'o 


2(af)'' 


where erfc(j ')= e ‘‘du (794) 


Note that T(,x,i) is a nonlinear function of a Because there is no 
characteristic length or lime in this problem the dimensionless tempera- 
ture given by (7 9 4) and the a sensitivity [given by (8 5 4)J can be plotted 
versus the single dependent variable t= at/x^, see Fig 8 12 
The parameter a was estimated using nonlinear ML sequential estima- 
tion for the multiresponse data mentioned above (two locations at each of 
which 28 measurements were simulated) In most cases only three itera 
tions were needed to converge to « after starting with the initial estimate of 
0006 ft*/hr which is low compared to the true value of 0 01 ft*/hr 
Estimated a values for the 15 cases are given m the third column of 



7.9 EXAMPLES UTILIZING SEQUENTIAL ESTIMATION 


401 


Table 7.13. The first five cases tabulated which are for p = 0 are more 
accurate on the average than the next five which are for p = 0.5 which are 
in turn more accurate (on the average) than the last 5 cases (p = 0.9). Hence 
we can conclude that correlated errors can substantially reduce the ac- 
curacy of the parameter estimates. 


Table 7.13 Results for Estimating Thermal Diffusivity for Cases 
with Noncorrelated and Correlated Errors 


Case 

P 

«Xl0'‘(ftVhr) 

R 


Z^Z 

X10-® 

est. s.e.(d) 
XIO^ 

1 

0 

99.71 

59.39 

1.080 

197 

0.740 

2 

0 

100.64 

54.15 

.985 

192 

0.716 

3 

0 

100.17 

58.71 

1.067 

195 

0.740 

4 

0 

99.29 

52.55 

.955 

199 

0.693 

5 

0 

98.56 

56.49 

1.191 

202 

0.768 

6 

0.5 

99.49 

60.85 

1.106 

52.4 

1.45 

7 

0.5 

101.88 

53.76 

.977 

49.7 

1.40 

8 

0.5 

100.39 

59.2 

1.076 

51.3 

1.45 

9 

0.5 

98.69 

Sin 

.958 

53.4 

1.34 

10 

0.5 

97.22 

56.3 

1.024 

55.3 

1.36 

11 

0.9 

98.6 

64.67 

1.176 

4.275 

5.25 

12 

0.9 

115.3 

53.40 

.971 

3.09 

5.61 

13 

0.9 

101.7 

59.37 

1.079 

4.00 

5.19 

14 

0.9 

97.2 

55.15 

1.003 

4.40 

4.97 

15 

0.9 

91.0 

61.90 

1.125 

5.06 

4.72 


The minimum sum of squares R also has a tendency to increase in 
variability with increasing p. The values of R were calculated using the 
converged values of d in (7.3.1) with U = 0 and W = rather than using 
SI directly, however, it was introduced by replacing X by Z and e by f as 
mentioned in Section 7.8.4. 

The mean sum of squares was formed using s^= R/(n —p) where n = 2x 
28=56 andp=l, in this example. The estimated standard error d was 
found using (6.8.3) which can be written for this one-parameter case as 


est. s.e.(a) = [Z^Z] ’^^5 


(7.9.5) 


ese values increase by a factor of about 7 from p = 0 to 0.9. In each set 
0 five cases for a given p value, the true a value of 100 X lO””* ft^/hr is in 
e interval given by a±est. s.e.(a) in three of the five cases. This is 
consistent with the 61% confidence interval that could be calculated using 
me / distributtion. 



402 


CHAPTER 7 MIMVnZATlON OF SUM OF SQUARES FU.NCnONS 


75.4 Anaijsis of Finite Heat-CondPcting Body with Multiresponsc Experi- 
mental Data 


7.9.4. J Description of Experiment 

In this expenment two adjacent, identical Armco iron cylindrical speci- 
mens (I m (00254 m) thick and 3 m (00762 m) in diameter) were heated 
by a single flat heater placed between them The heat flow was in the axial 
direction Ail surfaces except those where the heater was located were 
insulated Four thermocouples (numbered 5, 6, 7, and 8) were carefully 
attached to the heated surface of each specimen and four (numbered 1-4) 
were attached in the opposite flat insulated surfaces The sensors were 
located at angles of 0, 90, 180. and 270* The sensors at 0 and 180® were 
electrically averaged as were also those at 90 and 270® This then provided 
eight temperature histones with four being at a heated surface and four 
being at an insulated surface 

Before the start of an expenment the specimens were allowed to come to 
a uniform temperature The heating period lasted 15 3 sec after which the 
specimens attained a higher equilibrium temperature For further discus- 
sion of the cxpcnmenis, see reference 27 

The data used to find parameters in this case are given in Table 7 14 An 
IBM 1800 computer was used to digitize the analogue signals produced by 
the thermocouples Typical results for the heated and insulated surfaces 
are shown in Fig 7 17. the heated surface temperature increases only 
during heating whereas the insulated surface temperature increases after 
heating and finally approaches the same equilibrium temperature as the 
heated surface 


Table 7.14 Measured Temperatures for Finite Heat-Conducting Specimen 


Time Tcmpencure (*F) for Nombered Thennocouples 

(sec) TC 1 TC2 TC3 TC4 PCl TC6 Tc"? TcT 


03 8167 81 37 

06 8167 8129 

0 9 8175 8145 

1 2 81 75 81 12 

1 5 81 59 81 45 

18 8150 8145 

2 1 8150 8137 

24 8M2 8IJ7 

2 7 SI 47 Si 37 

30 81 59 81J9 

3 3 81 67 81.29 


8U6 8137 8149 

8152 8161 8141 

8152 8145 8149 

8144 81 53 8141 

8)60 8161 8133 

8152 8153 8141 

8152 81 61 8133 

81 52 81 37 81 58 

Sl*t S14S SiSS 

8144 8161 8158 

8152 8169 8166 


8158 81 69 81 68 

8174 8160 8160 

8166 8144 8177 

81 66 81 52 8193 

8166 8) 60 8160 

8150 8169 81 77 

8166 8160 8160 

8158 8169 81 77 

31 74 Si 60 SitO 

8158 i\M 8168 

81 74 8160 8177 



Table 7.14 (Cont.) 


jj^g Temperature (“F) for Numbered Thermocouples 


(sec) 

TC 1 

TC2 

TC3 

TC4 

TC5 

TC6 

TC7 

TC8 

3.6 

81.59 

80.96 

81.36 

81.45 

82.96 

83.21 

83.24 

83.64 

3.9 

81.67 

80.96 

81.44 

81.69 

84.03 

84.03 

84.22 

84.78 

4.2 

81.59 

80.80 

81.28 

81.53 

85.01 

84.68 

85.11 

85.19 

4.5 

81.59 

80.80 

81.28 

81.45 

85.50 

85.58 

85.77 

85.85 

4.8 

81.42 

80.96 

81.36 

81.37 

85.99 

85.83 

86.34 

86.42 

5.1 

81.59 

80.88 

81.28 

81.45 

86.31 

86.48 

86.75 

86.91 

5.4 

81.67 

80.96 

81.36 

81.61 

86.80 

86.81 

87.16 

87.07 

5.7 

81.67 

81.04 

81.36 

81.53 

87.29 

87.30 

87.56 

87.56 

6.0 

81.75 

81.04 

81.36 

81.45 

87.62 

87.62 

87.97 

87.89 

6.3 

81.83 

81.21 

81.44 

81.61 

88.03 

87.87 

88.14 

88.37 

6.6 

81.91 

81.21 

81.60 

81.53 

88.36 

88.27 

88.54 

88.62 

6.9 

81.83 

81.53 

81.60 

81.61 

88.60 

88.68 

88.87 

88.70 

7.2 

81.99 

81.61 

81.85 

81.77 

89.01 

88.93 

89.36 

89.19 

7.5 

81.91 

81.37 

81.77 

82.02 

89.25 

89.01 

89.60 

89.35 

7.8 

82.16 

81.78 

81.93 

82.02 

89.50 

89.25 

89.77 

89.60 

8.1 

82.24 

82.02 

82.18 

82.10 

89.91 

89.74 

89.77 

89.84 

8.4 

82.40 

82.02 

82.09 

82.26 

90.07 

89.91 

90.18 

90.09 

8.7 

82.48 

82.43 

82.34 

82.35 

90.32 

90.07 

90.50 

90.25 

9.0 

82.65 

82.68 

82.42 

82.35 

90.56 

90.23 

90.83 

90.33 

9.3 

82.73 

82.68 

82.75 

82.59 

90.97 

90.56 

91.07 

90.58 

9.6 

82.81 

82.43 

82.58 

82.59 

91.21 

90.64 

91.16 

90.99 

9.9 

82.97 

83.17 

82.91 

82.92 

91.46 

90.89 

91.40 

90.74 

10.2 

83.14 

83.25 

83.07 

82.92 

91.62 

91.29 

91.65 

90.99 

10.5 

83.22 

83.57 

83.40 

83.08 

91.87 

91.29 

91.81 

91.23 

10.8 

83.46 

83.74 

83.48 

83.33 

92.03 

91.46 

92.22 

91.39 

11.1 

83.71 

83.82 

83.56 

83.24 

92.28 

91.70 

92.38 

91.72 

11.4 

83.71 

84.31 

83.73 

83.57 

92.69 

92.03 

92.46 

91.72 

11.7 

83.87 

84.39 

83.97 

83.57 

92.77 

92.19 

92.71 

91.96 

12.0 

84.03 

84.72 

84.22 

83.73 

92.93 

92.27 

92.87 

92.21 

12.3 

84.20 

84.96 

84.38 

83.90 

93.26 

92.44 

93.28 

92.13 

12.6 

84.36 

84.88 

84.38 

84.22 

93.34 

92.60 

93.52 

92.29 

12.9 

84.52 

85.21 

84.54 

84.22 

93.50 

92.93 

93.52 

92.62 

13.2 

84.52 

84.96 

84.79 

84.63 

93.67 

93.25 

93.77 

92.78 

13.5 

84.85 

85.05 

84.71 

84.55 

93.75 

93.17 

93.93 

92.94 

13.8 

85.10 

85.05 

84.87 

84.63 

94.07 

93.58 

94.18 

92.94 

14.1 

85.10 

84.96 

85.11 

84.71 

93.99 

93.58 

94.09 

93.27 

14.4 

85.42 

85.05 

85.28 

84.96 

94.24 

93.91 

94.42 

93.52 

14.7 

85.50 

85.29 

85.20 

85.20 

94.40 

93.91 

94.42 

93.84 

15.0 

85.75 

85.37 

85.28 

85.04 

94.56 

93.91 

94.75 

94.17 

15.3 

85.83 

85.21 

85.52 

85.37 

94.73 

94.48 

94.99 

94.25 

15.6 

85.75 

85.62 

85.85 

85.28 

94.97 

94.48 

95.07 

94.41 

15.9 

85.91 

85.78 

85.93 

85.77 

95.22 

94.64 

95.07 

94.82 

16.2 

86.16 

86.03 

85.85 

85.94 

95.30 

94.56 

95.40 

94.82 

16.5 

86.32 

86.03 

86.26 

86.02 

95.38 

95.05 

95.73 

95.07 


403 



TaWe7,14 (Cotit.) 


(sec) 


TempertUire (*F) for Numbered Thermocouples 


TC 1 

TC2 

TC3 

TC4 

TCS 

TC6 

TC7 

TCS 

168 

8657 

8611 

86 34 

86 10 

95 71 

95 13 

95 89 

9523 

17 1 

86 73 

8635 

8634 

86 35 

95 71 

95 29 

95 89 

95 31 

174 

8706 

8643 

86 58 

8651 

9603 

95 54 

95 97 

9547 

177 

87 14 

86 52 

8683 

8667 

96 28 

95 54 

96 14 

95 72 

180 

87 22 

8652 

869] 

8683 

96 52 

95 70 

9630 

95 96 

183 

8746 

86 92 

8699 

8716 

9644 

96 03 

96 54 

96 04 

186 

87 63 

87 01 

87 32 

87^ 

96 77 

9603 

9671 

96 37 

189 

87 71 

8717 

8740 

8732 

95 87 

95 29 

95 81 

95 47 

192 

87 87 

8750 

8748 

87 57 

94 56 

94 31 

94 SO 

94 33 

19 5 

87987 

87 66 

87 73 

8749 

9399 

93 74 

91 85 

93 52 

198 

88 12 

87 66 

8805 

8790 

93 67 

93 25 

93 36 

93 19 

201 

88 52 

88 07 

88 14 

88 06 

93 18 

92 76 

93 12 

9303 

204 

88 61 

88 07 

8822 

8806 

92 93 

92 52 

92 79 

9245 

20 7 

88 77 

8823 

88 54 

88 55 

92 52 

92 19 

92 38 

92 29 

210 

88 77 

88 56 

88 54 

88 30 

92 36 

91 95 

92 14 

92 05 

213 

88 93 

89 31 

8979 

89 55 

9211 

91 70 

92 05 

91 88 

2U 

8918 

88 39 

88 79 

88 63 

91 79 

9162 

91 89 

9180 

219 

8910 

88 72 

8887 

88 55 

91 79 

91 46 

9165 

9172 

22 2 

89 26 

88 64 

8911 

8896 

91 54 

9) 38 

9140 

9131 

22 5 

89 42 

83 80 

8920 

8904 

91 62 

91 05 

91 40 

91 39 

22 8 

89 59 

8888 

8944 

89 12 

91 46 

9089 

9124 

91 15 

23 1 

8975 

88 88 

8944 

89 37 

91 38 

9097 

91 16 

9107 

23 4 

8991 

8913 

8969 

89 37 

91 13 

90 81 

9107 

9074 

23 7 

8983 

8921 

8944 

8937 

91 13 

9072 

91 16 

90 99 

240 

8991 

8929 

8952 

8937 

9! 13 

9089 

9091 

90 $0 

24 3 

8991 

89 29 

8960 

89 53 

91 (3 

9081 

90 67 

9074 

246 

9008 

89 37 

8977 

89 53 

90 89 

9072 

907$ 

9074 

249 

8991 

8937 

8969 

8945 

9097 

9056 

907$ 

90 66 

25 2 

90 16 

89 37 

8977 

8961 

9064 

90 56 

90 75 

90 74 

25 5 

9016 

89 54 

89 79 

8961 

90 8) 

90 48 

9067 

9082 

25 8 

9008 

8954 

89 85 

89 77 

90 72 

90 48 

90 67 

9058 

26 1 

90 16 

8962 

9001 

8977 

9089 

9040 

90 58 

90 58 

264 

9024 

8970 

89 85 

89 85 

9056 

90 32 

90 58 

90 66 

267 

9016 

89 54 

8977 

8983 

9064 

9048 

9058 

9041 

27 0 

9016 

89 62 

8993 

8985 

90 64 

90 32 

90 50 

904! 

27 3 

90 32 

89 62 

9018 

8994 

9040 

90 56 

9042 

9050 

27 6 

9024 

8962 

8985 

89 85 

9056 

90 40 

90 42 

9017 

279 

9040 

89 62 

9001 

89 85 

90 56 

90 23 

9050 

90 33 

28 2 

90 24 

89 62 

9001 

8969 

9056 

9032 

90 58 

9050 

28 5 

9040 

89 70 

9009 

89 94 

90 48 

90 15 

9026 

9033 

28 8 

9040 

8970 

9009 

8977 

9040 

9015 

90 26 

9050 

29 1 

9024 

8962 

9001 

89 85 

9040 

9023 

9034 

90 33 



g9?S 


S9SS 





29 7 

9040 

89 78 

9001 

8994 

90 48 

9040 

9014 

9025 

300 

9048 

8962 

8993 

8977 

9048 

9032 

9034 

9050 


401 


7.9 EXAMPLES UTILIZING SEQUENTIAL ESTIMATION 


405 


Owing to the statistical design of the experiment there are two types of 
replicated measurements. First, until the heating starts at 3.3 sec, the 
temperatures are the same (within experimental error) for all times and 
thermocouples. Second, there are four sensors at both the heated and 
insulated surfaces. The standard deviations of the temperatures until 3.0 
sec for any given sensor is about 0.1 F° (0.06 K); the standard deviation of 
the average temperatures of the eight sensors is also about 0.1 °F (0.06 K). 
During heating and at the heated surface, the standard deviations of 
temperature at fixed times are almost 0.4F° (0.22 K). These values of the 
standard deviation provided estimates of the pure error. 

Even though the temperature calibration for the data shown in Table 
7.14 was carefully performed, there are slight differences in the initial 
temperatures readings given by the various sensors. In an attempt to 
reduce bias, corrections of —0.06, +0.18, +0.04, +0.03, +0.09, —0.09, 
-0.06, and — 0.17F° were respectively added to measurements given by 
sensors 1 through 8. 

Another consideration in the design of the experiment was to achieve a 
relatively large signal-to-noise ratio while still not having such a large 
variation in temperature that the parameters would change. A maximum 
temperature rise about 15F° (8.33 K) satisfies both conditions; using the 
pure error standard deviation value of 0.1 F° (0.06 K), the maximum 
signal-to-noise ratio is 150. 



Time (Sec) 

Figure 7.17 Typical temperature histories for the finite heat-conducting specimens. 




406 


CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS 


7.9 4 2 Physical Model of Heal-Cemdaeimg Body 
The temperature m each specimen can be mathematically described by 




3 / 


37(0,0 
7(jc.0) = 81 54"? 


3T(L,() 


(7 96) 

(7 9 7) 
(7 9 8) 


where k is thermal conductivity and c is the density-specific heat product 
In both specimens the coordinate jr is measured from the heated surface 
The specimen thickness L was 1/12 f (00254 m) The parameters are k 
and c The heal flux q(0 was xcto unltl i = 3 3 sec and was zero after 
/= 18 6 sec. the value of ^(0 tn the interval 3 3 < r 18 6 sec was 2 67 
Btu/ft^-sec (30 300 W/m^ One can derive the exact solution of this 
problem in terms of infinite scries However PROGRAM PROPERTY 
(developed at MSU) which utilizes finite differences in the solution of 
(7 9 6-7 9 8) was used to obtain the parameters 
The sensitivity coefficients shown m Fig 7 18 were also obtained by 
using finite differences as indicated by (7 101) Note that those for c are 





IS EXAMPLES UTILIZING SEQUENTIAL ESTIMATION 


407 


always negative whereas those for k are both positive and negative. Hence 
the sensitivities are not linearly dependent and no difficulty would be 
anticipated in estimating the parameters. 

7.9.4.3 Parameter Estimates 

For the data given, the ordinary least squares parameter estimates are 
ic=43.343 Btu/hr-ft-F (75.014 W/m-K) and c = 55.596 Btu/ft^-F (3728.6 
kJ /m^-C). If the data for only part of the interval had been used, the k and 
c values would have been modified by the additive changes tsk and Ac 
given in Fig. 7.19. Note that the changes in k in the last third of the time 
interval is less than 1% of the final k value; the c value changes about 2% 
in the same interval. These small values suggest that the model is correct. 
For times less than 7 sec the corrections become very large which would be 
expected since the sensitivity coefficients shown in Fig. 7.18 are approxi- 
mately linearly dependent for such small times. 

The residuals for all eight temperature histories are graphed in Figs. 7.20 
and 7.21. Evidently, the measurement errors cannot be considered to be 
independent. From reference 28, p. 97, the mean number of runs (changes 
of sign plus 1) of the residuals is about n/l for independent observations. 



200 

too 


0 


-100 


o 

I 

n 


-200 


-300 


figure 7.19 Sequential 


Time (Sec) 

corrections to the parameters for finite heat-conducting example. 



408 


CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCHONS 



5 6 “s ' ’ li S S » 


fme &K> 

Figure 7.20 Residuals for thermocouples M which are located at the insulated surface of 
the specimen 

There are 89 observations for each thermocouple and the number of runs 
vanes from 4 to 23 with the average being 12 75 This value is considerably 
less than 89/2 = 44 5 For first-order cumulative errors, the expected num- 
ber of runs is about which IS 9 4 m this case Hence the measurement 
errors would be more appropriately modeled as being cumulative rather 
than as being independent (An analysis for p similar to that in Section 
6 9 5 modified for nonlinear parameters gives psssO 95 ) 

U the usual equation, for esUtoatmg is used, we have 




Residuals, 


7.9 examples utilizing SEQUENTIAL ESTIMATION 


409 



Figure 7.21 Residuals for thermocouples 5-8, which are located at the heated surface of the 
specimen. 


where n = 89 and m = 8. The resulting s value is 0.339, which is different 
lhan the pure error estimates which were near 0.1 partly because the 
residuals are highly correlated. 

In finishing this analysis we should calculate the confidence region for k 
and c. But since the confidence region is discussed above for other cases, it 
is not given. Note, however, that various basic assumptions should be 
checked first. The standard assumptions of additive, zero mean, constant 
variance, normal measurement errors seem reasonable. Also, the time and 
location measurement errors are probably negligible and no prior informa- 
lion is used. The assumption of independent errors is not appropriate. 




410 


CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS 


These assumptions are designated IIIOIOII If a first-order autoregressive 
mode! is used, p is found to be about 095 Using this p, the covariance 
matrix of the errors ^ is approximated by as indicated in Appendix 

6A For this case in which OLS is used an approximate confidence 
interval could be constructed using (77 9) and (7 7 1 Oa) and as discussed 
following these equations 

A final comment is made regarding the estimation procedure Data from 
all eight thermocouples were used and eight sets of residuals were found 
Because the measurements at j:= 0 and Z. are both repeated four times, 
the same parameter values would be found using the average values at 
each time at these two locations In other words instead of having /« = 8, 
we would have /n » 2 Justification for this simplification is given in Section 
55 


7.J0 SENSITIVITY COEFFICIENTS 

There are several methods of determining the sensitivity coefficients One 
can evaluate them by generating sensitivity equations by differentiation of 
the model equations One can also use finite differences which are simpler 
m terms of computer programming and are usually adequate, although 
there are cases when direct solution of the sensitivity equations is needed 
This section discusses both methods Even though one may evaluate the 
sensitivity coefficients utilizing finite differences the study of the sensitiv- 
ity equations can yield insights into the sensitivity coefficients themselves 

7.10 1 Finite Difference Method 

One finite difference approximation for the sensitivity for the ilh observa 
tion. ylh parameter, and /ih dependent vanable is the forward difference 
approximation 

0n;(O + 

' (’'»■! 

where 6^ is some relatively small quantity One possible value of is 
given by 

(7 102) 

This simple procedure is frequently quite satisfactory Note however, that 
if the initial value of is chosen to be zero or if fc, approaches zero (7 10 2) 
can lead to difficulty 



7.10 SENSITIVITY COEFFICIENTS 


411 


Bard [5, p. 118] gives a brief discussion leading him to recommend the 


value of 


156,1 = 


2\/2 j 




1/2 


(7.10.3) 


where e is the relative error in the computed values of 5 and where C), is 
defined by (7.4.13). He also recommends that lower and upper limits be 
placed on the values given by (7.10.3) such as 10“^|6^| < Sbj < 10“^|6,|. Two 
sources of error contribute to inaccuracies in approximating Xij(i): (1) the 
rounding error when two closely spaced values of tj are subtracted, and (2) 
the truncation error due to the inexact nature of (7.10.1). In order to 
reduce the truncation error the constant 0.0001 in (7.10.2) is made small. 
If, however, it is too small then the rounding error becomes important. A 
more accurate finite difference approximation is the central difference 
scheme. 






28b. 


(7.10.4) 


Unfortunately this approximation requires almost twice as many values of 
T]i to be calculated as does (7.10.1). For this reason the use of (7.10.4) is not 
recommended. 

If there is uncertainty due to the use of (7.10.1) one could repeat the 
final iteration by replacing (7.10.1) by a backward difference approxima- 
tion (replacing Sbj by — Sbj) and then averaging the parameter estimates 
from the forward and backward difference approximations. 


7.10.2 Sensitivity Equation Method 

When the model is a set of first-order ordinary differential equations or a 
partial differential equation, the solution frequently cannot be written 
explicitly. In such cases explicit expressions for the sensitivities cannot be 
given. One can, however, derive sensitivity equations which can be solved 
separately from the model. 

Consider, for example, the model 

= (7.10.5) 

where k and c are parameters. The boundary and initial conditions are 

= 0, r(x,0)=r, (7.10.6) 







412 


CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS 


In this case T represents the dependent variable rj Differentiating (7 10 5) 
and (7 10 6) with respect to k yields 


ajc^ ^ 


(7 107a) 


il! ^1 

Zx 3 a |, 


=0, A,(x.0)=0 (7 107b) 


where X^ = hT/^k Notice that (7 107a) contains a nonhomogeneous term 
which is contained in (7 105) Solution of (7 10 7) provides a solution to 
the sensitivity coefficient for k Unfake the use of differences such as in 
(7 10 1) which utilizes the program calculating tj. a special computer 
program must be written for calculating A", using (7 107), in many cases 
this would involve a finite difference computer program The sensitivity 
equation for c is found by diffcrenlialing (7 10 5,6) with respect to c, 


3a 


’ 0 . 


0A'j 

1a 


■0, A'2(J0)»0 (7 108b) 


where A'jsar/ae 

From a knowledge of the solutions of the above equations one can 
predict several results First the basic problem (7 10 5) and (7 106) is 
linear and so are the sensitivity problems for AT, and X^ Second A", is a 
function of k and Xj is a function of c Hence T is a nonlinear function of 
k and c Third, in general X, and A’j will not be zero at the boundaries 
A!=0 and L Next m (7 108) the only nonhomogeneous term is BT/di 
which, for monotonically increasing time values of T, will be positive, such 
a term acts like a heat sink which will cause X 2 to be always negative 
(provided T is monotonically increasing) Fifth, the d^T/dx^ in (7 10 7a) 
which IS equal to {k/c) dT/di simulates a heat source but the dT/bx at 
A = 0 term in (7 107b) simulates a heat jinA: at a = 0 Hence A, could be 
either positive or negative and would in general be smaller in magnitude 
than X 2 

Another point relates to the relationships between T, T,, Xj, and X 2 By 
multiplying (7 10 7) by k and by multiplying (7 10 8) by c and then adding 
the corresponding equations together one can show that 


T(x,i)-T=-kXi (A.r) - cXj (a f) 


(7 10 9) 



7.10 SENSITIVITY COEFFICIENTS 


413 


Insight regarding the sensitivity coefficients can be obtained from this 
equation. For example, the larger T {x,t)— 7}, the larger on the average are 
the magnitudes of Z, and A'j. (7.10.9) also suggests that both sensitivity 
coefficients need not be calculated because T must always be calculated. 

A final point is that in this case the parameters can be transformed so 
that T is linear in one of them. This is true for the parameters a = k/c and 
K=k~^; with these parameters the model (7.10.5) and (7.10.6) becomes 


a^r ar 

“ — T = ^ 


(7.10.10a) 




ax 


.x: = 0 


ax 


= 0 , 


x=L 


The sensitivity coefficient for K, denoted X^, 


7’(x,0)=r, (7.10.10b) 

is found from a solution of 


a% aA'o 

— (7.10.11a) 


9=- 



aA'j 

ax 


= 0 , 

x~ L 


A'3(x,0) = 0 (7.10.11b) 


Notice that is independent of K and hence is linear in K. By multiply- 
ing (7.10.11) by K and then comparing with (7.10.10), one can find 


T{x,t)-Ti=KX^(x,t) (7.10.12) 

and thus (x ,0 can be very simply found from T{x,t) using this 

equation. 

In summary, we have given two basic methods for evaluating sensitivity 
coefficients when the model is not an algebraic form but rather involves 
solution of differential equations. It is assumed that these equations cannot 
he readily solved to obtain closed forms since otherwise algebraic forms 
can be written. We imagine that these equations are solved numerically 
using finite differences or some other method. The first method of evaluat- 
ing the sensitivities involves only a computer program to approximate the 
ifferential equations for the model; (7.10.1), which requires only depon- 
ent variable values, is used. In general this is the simplest and recom- 
incnded method. The second method involves numerically approximating 
c sensitivity equations which derived from the model. Examples of 
sensitivity equations are given by (7.10.7) and (7.10.8). Even though the 
sensitivity equations are not solved, inspection of these equations can 
sometimes provide considerable insight and/or relations between the 
sensitivity coefficients. 



414 


CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS 


REFERENCES 

1 Bard Y ‘'Comparison of Gradient Methods for the Solution of NonI near Parameter 
Estimation Problems SIAM J f/umer Anal 7{1970) 1S7 186 

2 Beveridge O S O and Schechier R S Opumitaiion Theory and Praeiice McGriw 
Hill Book Company New York 1970 

3 Levenberg K A Method tor the Solution of Certain Non linear Problems in Least 
Squares Quarr AppI Math 2 (194^ 164-168 

4 Marquardt D W “An Algonihm for Least Squares Estimation of Nonlinear Parame 
ters " / Soe Ind AppI MaiK 11(1963) 431-441 

5 Bard Y Nonlinear Parameter Eiimaiion Academic Press Inc New York 1974 

6 Box G E P^ Use of Statistical Methods m the Elucidation of Physical Mechanisms” 
Bull Inst Intern Slat 36(1958) 2IS 2SS 

7 Booth G W and Peterson T 1 Non linear Estimation IBM Share Program Pa No 
687 WINL1 1938 

8 Hartley H O The Modi! ed Gauss Newton Method for the Fitting of Nonlinear 
Regression Funei ons by Least Squares” Technometr a 3 (1961) 269 280 

9 Box O E P and Kanemasu H Topics in Model Building Part II On Non linear 
Least Squares Tech Rep No 321 University of Wisconsin Dept of Statistics 
Madison Wis Nov 1972 

10 Davies M and Whitting I ! A Modified Form of Levenbergs Correct on in 
Numerical Methods for Non Linear Opumsauon edited by F A LooUma Aeadem c 
Press London 1972 pp 191 201 

11 Gallant, A R Nonlinear Regresson Am. Stas 29(1975) 73 81 

12 Seinfeld 1 H and Lapidus L Maihemancal Models in Chemical Engineer ng yol 3 
Process Mode! ng Estimation and Idem f cotton PrentceHall Ine Englewood Cl ffs, 
NJ 1974 

13 Craupe D Identification of Systems Van Nostrand Reinhold Company New York 
1972 

14 Jones A SPIRAL — a New Algonihm for Nonlinear Parameter Estimation Using 

Least Squares Comput J 13(1970) 301 308 

15 Beale E M L Confidence Regions id Nonlinear Estimation J Roy Slat Soc B22 
(I960) 41 76 

16 Gullman I andMeeler D A ”On Beales Measuies of Nonhneaniy Technometncsl 
(1965) 623-637 

17 Bacon D W and Henson T L Statistical Design and Model Building, Course 
notes Dept of Chemical Enpnemng Queen s University K ngston Ontario 

18 BOX.G E P Fitting Empincal Data Arm N Y Acad. Set 86 (1960) 792 

19 Box G E P “Some Notes on Nonbnear Estimation Tech Rep No 25 University of 
Wisconsin Dept of Statistics Madison Wis 1964 

20 Gallant, A R The Power of the Likehbood Ratio Test of Location in Nonlinear 
JBAj'Wif-svxn .Mndelx J .e.tv .Ena' .tssnr 2P/JAT.V iPS-SC'E 

21 Box G E P and Hunter J S A Confidence Region for the Solution of a Set of 
Simultaneous Equations with an Appbcatioa to Experimental Design Biomelrika 41 
(1954) 190-199 



problems 


415 


22 Box, G. E. P. and Hunter, W. G., “A Useful Method for Model-Building,” Technometr- 
,m4 (1962), 301-318. 

23 Beck J. V., “Determination of Optimum, Transient Experiments for Thermal Contact 
Conductance,” Int. J. Heat Mass Transfer 12 (1969), 621-633. 

24 Beck J. V., “Transient Sensitivity Coefficients for the Thermal Contact Conductance,” 
Int J. Heat Mass Transfer 10 (1967), 1615-1617. 

25 Van Fossen, G. J., Jr., “Design of Experiments for Measuring Heat-Transfer 
Coefficients with a Lumped-Parameter Calonmeter,” NASA Teeh Note NASA TN 
D-7857, Jan. 1975. 

26 Bunngton, R. S. and May, D. C., Handbook of Probability and Statistics with Tables, 2nd 
ed, McGraw-Hill Book Company, New York, 1970. 

27 Faraia, K., “Computer-Assisted Experimental and Analytical Study of Time/Tempera- 
lure-Dependent Thermal Properties of the Aluminum Alloy 2024-T351,” Ph.D Thesis, 
Dept of Mechanical Engineering, Michigan State University, 1976. 

28 Draper, N. R. and Smith, H., Applied Regression Analysis, John Wiley & Sons, Inc., New 
York, 1966. 


PROBLEMS 

7.1 Using OLS with the Gauss linearization method, estimate P in the model 
T]=3n/{\ + ^t) for the data 

t, 0.25 0.5 0.75 1 2 3 

Y 2 150 90 70 55 30 20 

(а) Start with p = 6 being the initial estimate. 

Ansner. 6.0648. 

( б ) Start with = 3 being the initial estimate. 

(c) Start with j 8 = 12 being the initial estimate. 

7-2 Using OLS with the Gauss linearization method, estimate /Sj and ^2 m the 
model t) = j8,/(1 + )320 for the data given in Problem 7.1. 

(a) Start with /), =377 and )32 = 5.8 being the initial estimates. 

{b) Start with /3i = 300 and jS 2 = 4 being the initial estimates. 

(c) Start with jS, = 600 and /72 = 8 being the initial estimates. 

(d) Start with / 8 , =600 and P 2 ~^ being the initial estimates. 

^•2 Using OLS with the Gauss linearization method, estimate Oq and Oi for the 
model and data in Example 5.2.3. 

For the model T= 81.5 -F 198. 3e~^', estimate using OLS and the Gauss 
method the parameter p for the data given in Table 6.2. Use as the data the 
F, values for observations 5 , 9 , 13, and 17. 

7.47295X10-“. 



416 


CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS 


7 5 Repeat Problem 7 1 using the Box-Kanemasu modification 

7 6 Repeat Problem 7 4 using the Box-Kanemasu modification 
7.7 Repeat Problem 7 I a using the sequential method of Section 7 8 3 1 
7 8 For the model r=^i + /8je“^>', estimate the parameters ySj, and for 

the Yf data given in Table 6 2 Use a computer program incorporating the 
sequential method Use OLS 

Answer. 100 60 178 83,0000882 

7.9 For the model given by 

estimate the parameters using a sequential computer program with OLS for 
the data given in Example 62 2 (Do not take logarithms of both sides) 

7.10 Repeat Problem 7 9 using as the sum of squares function. 


7.11 Using the Box-Kanemasu approximation and the initial estimate of 

10, estimate M for the problem of Section 7S 1 Use a programmable 
calculator or a computer 

7.12 Sometimes two functions are known and one or more parameters in one 
function IS to be adjusted to cause ’'agceement" An example is utatihu, 
which IS to be approximaied by « V( • /^“*) for ‘‘small” values of u 

(а) By using Taylor senes show that ‘agreement* is obtained by letting 

(б) Suggest a mathematical function which would be minimized to obtain 
agreement in (his case 

7 13 Another set of functions as in Problem 712 is uV[2(l + ^*)l and ul\,{.u) 
//o(w) where /(«) is a modified Bessel function By using a Taylor senes 
show that "agreement ’ is obtained by letting 
7.14 The following measuremcnis and associated standard deviations are for the 
model 

H=flicxp(-^jf) 

/, 0 0125 025 0375 0 5 

y, lOl 66 44 28 20 

0, 1 3 6 2 1 

Use maximum likelihood estimation to estimate and assuming the 
standard assumptions indicated by 1 101 1 H I are valid Let the initial values 
of P^ and Pi be 100 and 3, respectively Use a sequential method of solution 
Also find the covariance matrix of the estimates and the minimum sum of 
squares 



PROBLEMS 


417 


Answer. 6, = 100.922, Z>2 = 3.2877, =0.9744, P,2 = 0.0245, 7^22 = 0.007933, S^.n 

= 0.8502820. 

7.15 When there are large numbers of equally spaced measurements in the region 
0< /< 1, the OLS sum of squares function is proportional to the integral, 

S= f\Y-nfdt 

•'0 

For the special case of Y=t^+P and 7^ = Pit^ + show that 

.2 .2 , _ 

■‘I ^2 

^=T3'^T5'*'~ 

where z, — \~ yS,. Use the method given in Section 6.8 to find the major and 
minor axes of this curve. Plot for 5 = 1 and 10. 

7.16 For the model t; = /3 , / + exp( — ^nd the data 

t, 0 1 2 

r, 1 2 3 

and using the OLS sum of squares function, show that for a fixed value of S 
the value of /J, is related to ^82 and S by 

^,= j8-e-^2-2e-2^2±[55-(l-e-^0‘*]'^'}/5 


(a) Plot the contour of S = 0.1 in the jS|,/82 plane. 

(b) From the discriminant in the expression for /3j show that the P 2 value 
can be as small as — ln[l +(55)''^'*] and can be as large as — ln[l — 
(55)'/“*] provided 5'<0.67. If S’ >0.97, what is the maximum value of 
82? What condition(s) must be given with respect to P 2 to insure that the 
contour for fixed S is closed? 

(c) Derive an expression for the Pi value at the end point(s) of an S contour. 
(At the end of the contour, the discriminant is equal to zero.) For S= 1, 
show that the end point is ,S| = — 1.3898 and ^2= —0.9144. 

{d) Expand S for small values of P 2 to obtain S«5[l — /S, + ;S2]^ and plot for 
8 = 0.1. Compare with the result of part (a). 

(e) Show that jX^X] = [2e“^2(l — pind the location of the P 2 values 

corresponding to the minimum and maximum values of 1X^X|. Plot 
|X^X| versus P 2 for - 1.5 <y82<4. 

^•17 Using the data of the preceding problem and the initial values of yS, = 1.5 
and P2=1.5, estimate /), and P 2 using the Box-Kanemasu modification of 
the Gauss method. 

hl8 Show that the sensitivity coefficients for the model 

\+Pit 
182 + 183/ 



s c 


418 CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCnONS 

are linearly dependent for a certaia combination of the parameter values 
[Hint wnie the sensitivities in terms of and R = 

What can you conclude from this Imear dependence’’ What new parameters 
could be selected to eliminate the Imear dependence mentioned above’ 

7 19 Repeat Problem 6 26 for the model 

7 20 Using the data in Table 714 between 3 6 and 18 0 sec for temperature 
histones from thermocouple 1 (which i$ at Jt L) and thermocouple 5 (which 
IS at x = 0) estimate k and a in the model given by (8 5 25) with 7’(|=8I 66'’F 
Z,= l/12f and 9=2 67 Btu/fl* sec Let / in (8 5 25) be the times in Table 
7 14 minus 3 3 In other words 3 6 sec m Table 7 14 corresponds to time 0 3 
sec in (8 5 25) Use as initial estimates fc**40 Btu/hr ft “F and a = 1 ft^/hr 
(Be careful with units) Use OLS with the Gauss or other method 
7JI Repeat Problem 7 20 but use the average of temperatures 1 2 3 and 4 
instead of I and the average of 5 6 7 and 8 instead of 5 
7 22 Derive (7 5 16) and (7 5 17) 

Derive (7 8 18) 

Verify the sensitivity coefficient values given in Fig 7 8 using the approxi 
mate equation (7 10 I) A programmable calculator or computer should be 
used Investigate using values of 3^ = <6^ where e is equal to (a) 001 (b) 
0001 and (c) 0 0001 



CHAPTER 


8 

Design of 

OPTIMAL EXPERIMENTS 


8.1 INTRODUCTION 

Carefully designed experiments can result in greatly increased accuracy of 
the estimates. This has been demonstrated by various authors, but special 
mention should be made of the work of G. E. P. Box and collaborators. 
See, for example. Box and Lucas [1] and Box and Hunter [2]. An important 
work on optimal experiments is the book by Fedorov [3]. 

In many areas of research, great flexibility is permitted in the proposed 
experiments. This is particularly true with the present ready accessibility of 
large-scale digital computers for analysis of the data and automatic digital 
data acquisition equipment for obtaining the data. This means that tran- 
sient, complex experiments can be performed that involve numerous 
measurements for many sensors. With this great flexibility comes the 
opportunity of designing experiments to obtain the greatest precision of 
the required parameters. A common measure of the precision of an 
estimator is its variance; the smaller the variance, the greater the precision. 
Information regarding the variances and covariances is included in de- 
termination of confidence regions. We shall utilize the minimum confi- 
dence region to provide a basis for the design of experiments having 
minimum variance estimators. 

The design of optimal experiments is complicated by the necessity of 
adding practical considerations and constraints. The best design for a 

419 



420 


CHAPTER 8 DESIGN OF OPTIMAL EXPERIMENTS 


particular case, for example, might have certain unique restrictions on the 
dependent variable vector ij or on the independent variables such as time 
or position The optimal design problem involves two parts (I) the de- 
termination of an objective function together with its constraints and (2) 
the extremization of the objective function 
When we say that we desire to find the optimal experiment, we wish to 
determine the conditions under which each observation should be taken in 
order to extremize a certain optimal criterion For example, the best 
duration of the experiment may be needed or the optimal placement of 
sensors may be required In cases involving partial differential equations 
the optimal boundary and initial conditions may also be needed Many of 
these cases are illustrated m subsequent sections 
In most of this chapter it is assumed that the form of the model is 
known although u contains unknown parameters If a form in terms of a 
finite number of parameters is not known the search for an optimal 
strategy may be quite different This involves discrimination, which is 
discussed in Section S 9 


8 2 ONE PARAMETER EXAMPLES 

In order to illustrate optimal design, some one parameter linear and 
nonlinear examples are given in this section The standard conditions 
designated 11 1 1 1-1 1 (see Section 615 2) are considered to be valid 

8 2.1 Linear Examples for One Parameter 
8 211 Model t;, = pX, nith No Constromis 
Consider first the case of the linear model q = PX, Owing to the standard 
assumptions ordinary least squares (OLS) and maximum likelihood (ML) 
yield the same estimator and vanance of 


2 yjXj 

, P(6)=y where A= 2 (B 2 I) 

Observe that minimization of the variance of b implies the maximization of 
A Note that A is maximized by (a) making the maximum value of lA"! as 
large as possible, (b) concentrating all the n measurements at the maxi 
mum permissible value of |Ar|, and (c) making n as large as possible 



8.2 ONE PARAMETER EXAMPLES 


421 


This case illustrates the necessity of certain practical constraints. The 
maximum value of |;!r| must be finite if \r}\ is to be finite. Next it must be 
decided what constraints, if any, are to be placed on the measurements in a 
fixed X range. If there are none, the optimal solution is to concentrate all 
the measurements at the maximum |A'l. In other cases where A' is a 
function of time, measurements at equal time intervals might be dictated 
by the capabilities of the measuring equipment. The latter case is empha- 
sized in this text because of its common occurrence. Furthermore, equal 
spacing of measurements usually provides more information for checking 
the validity of the model and the statistical assumptions than does con- 
centrating the measurements at the maximum IXI- 

8. 2. 1,2 Model rt = PX (/) for Fixed Large n and Equally Spaced 
Measurements 

The model tj = px can represent cases where X is any known function of 
time. (The word “time” is used but the results could also apply for other 
variables such as position, temperature, etc.) For a large number of 
observations, A can be approximated by 

i ^'(0 = T7 i ['"X^iOdt (8.2.2) 

1 = 1 ^'i = l ^n-'O 

where t^ = nts.t and the measurements are assumed to be uniformly spaced 
in t over 0 < / < 

Example 8.2.1 

Compare the value of A associated with n measurements of 7 j = j8C/„'" and for 
V, = PC (it„/n)"' with i=l,2,...,n, r, > r, for i>j\ and where m is a nonnegative 
exponent. Let n be large. C is an arbitrary constant which plays no role in this 
problem but is included for later use for scaling tj- Notice that the first case has all 
the measurements concentrated at the location of the maximum i) of the second 
case. 


Solution 


For the first case the sensitivity X is C/" and then A obtained from (8.2.1) is 
nC For the second case, (8.2.2) yields 


In both cases A is proportional to and thus is made larger by increasing n, 



422 


aiAPTER 8 DESIGN OF OPTIMAL EXPERIMENTS 


C^, or i„ The ratio of the first A to the second is 2« + I Hence for all models with 
m>0 p can be estimated more accurately by concentrating all the measurements 
at the maximum v, this becomes more appamnl as m is increased in value For 
dynamic experiments however, measurements uniformly spaced in time are usually 
more appropnate than concentrated ones 

In this section the constraint of a fixed but large number of observa- 
tions, n, IS investigated In any practical expenment n must be finite No 
constraint is placed on the magnitude of X(/) or, equivalently, on rj For 
the case of n equally spaced measurements starting at r = 0 and ending at 

the cnterion for optimum measurements for the linear model T] = px(i) 
IS to maximize 

(823) 

n 

With respect to /„ It is assumed that n is large and the standard assump- 
tions 11111-11 are valid Notice that is a function of but not n is 
now simply the maximum i 

A necessary condition for A” to be a maximum with respect to the 
duration of the expenment („ is that 

^■■0-(-f,-7J-[jr(/)lV,+ i[A(U]’)| (8 24) 

where r is the time t„ that maximizes d* (S 2 4) can also be written as 

[A'(t)]^=t ' r(Ar)*<* = <i"(^) (82 5) 

■'0 

This expression is interesting because a provides insights into conditions 
that permit an optimum tune to exist In words (8 2 5) stales that at the 
optimum time the square of the sensitivity must equal the average value of 
the squared sensitivity 

Example 8,2 2 

The velocity distribution for laminar flow between parallel plates separated by the 
distance H is 



where is the maximum velocity and y is the distance measured from one wall 
Suppose that value Uq is the parameter of interest and that u is to be measured at 
equal intervals fromy-^O to where A" is maximized Find the y value to maximize 



8.2 ONE PARAMETER EXAMPLES 


423 


A". This would provide an optimal experiment for this case provided the standard 
assumptions are valid relative to errors in u and the y measurements are errorless. 

Solution 

The optimal distance y can be found by using (8.2.5) with t being y/H and 
T(f)=t(l-r); the optimal value of r=y/H is then found from 

T\\-Tf = r-^rt^(l~lfdt = (tV3) - 0.5r^ + 0.2 t^ 

Jo 

which simplifies to the algebraic equation 24 t^ — 45t + 20 = 0; this is a simple 
quadratic equation which can be solved for t = 0.724 and 1.15 but only the first is 
physically possible. Hence the optimal maximum y is 0.724/7. 

Now (8.2.5) is a necessary, but not sufficient condition. It is also true at 
relative maxima and minima as well as the true maximum. For the maxima 
there are a number of possibilities with respect to (8.2.5). First, it might not 
be satisfied at any finite time, thus indicating that there is no maximum at 
a finite time. Next, it might be satisfied at all time t, indicating that the 
maximum is attained at all times t > 0. Also (8.2.5) could be satisfied at one 
as well as many values of t. Each of these cases is illustrated below. 

Some general observations can be drawn from (8.2.5). Visualize a 
function |A"(?,)1 that is zero at t — 0, increases monotonically with t until 
some time and decreases monotonically to zero. Such an X{t) is 
shown in Fig. 8.1. Here 

A'(r) = rexp(l — r) (8.2.6) 





424 


CHAPTER g DESIGN OF OPTIMAL EXPERIMFKTS 


As long as |-V| is increasing, the instantaneous must be larger than the 
average and consequently a maximum in A" cannot occur when lA"! is 
monotonically increasing After the maximum X^, the instantaneous value 
of decreases, but the average contioues to rise for a while, for any (A'l 
function reaching a maximum monotonically and then decreasing, the 
maximum A" must be at some time greater than the time at which [A'l 
has a maximum 

Consider now the X (r) function given by (B 2 6) which is shown in Fig 
8 1 along with A" and X^ The maximum of A" is at t= 1 691817 (see 
Problem 8 1) Notice that the crosses A" at its maximum as indicated by 
(8 2 5) 

Condition (8 2 S) can be readily used (or other cases Several special 
cases are now considered, see Figs 8 2a and 8 2b Case I is for a constant 
X and thus the average of (AT)* u equal to (AT)* at all limes, hence all times 
can represent optimum conditions Case 2 is for X an exponential which 
increases asymptotically to unity The maximum A" occurs only at infinite 
l, but a maximum is closely approximated in a finite time Case 3 is for a 
decaying exponential which has a maximum at r— 0 Case 4 has a mono 
tonicaily increasing X and as a consequence a monotonically increasing A" 
Further cases are shown in Figs 8 3<7 and 8 3b Case S is a cosine which 
has a maximum A” at r—O Case 6 is the sine function which has a 
maximum A" at /-2 2467 Both these sinusoidal cases have numerous 
maxima or minima, but only one global maximum The final case, case 7, 
has its function X depicted m Fig 8 3a. and its A" shown m Fig 8 3^. iC is 
first positive, becomes negative and asymptotically approaches -0 5 This 
case has a local maximum of A' near r='04 but the true maximum occurs 
at infinity 

Example 8 13 

A point on a rotating wheel is observed normal to its axis and is seen to move a 
distance j with respect to the axis The known model js j = ^sinL>t where is 
known angular velocity The measurements of / can be assumed lo be errorless but 
those of j satisfy the standard condilions A large number of uniformly spaced 
measurements are to be taken starling at /=0 For an optimal test to estimate yff 
what should be the duration of Ihe test’ 

Solution 

rrte cumAtiuii!.' Air S'JJ-anfJ’aAj-AiaJ ,<si' AWr cofif ,v x.'hyr.’r i’mj' a 

global maximum at ur=22467 radians Hence the duration of the test should be 
/,-2 2467/w 





Figure 8.2b A" criterion for the sensitivities given in Fig. 8.2a. The only constraint is that of 
a fixed large n. 


425 



0 2 « 6 8 10 


rw 

Figure 8 Jb i* entenon for Iht sens tivjwes giveo to Fig 8 3a The only constf aint is that of 
a fixed large n 

8213 Model tj = PX it) for Fixed Lorge n and Fixed Maximum Value 

dfM 

For some models of tj there are upper bounds of (rjl imphcU m the model 
Examples are the 1} models of ^Csin/ fiCoost pCe and^Clanh3/ In 
each of these cases the largest possible J-ijI is j^Cj In other cases such as 
for ri = 0Cl there are no unplicit limits on jrij if the / range ts not 




8.2 ONE PARAMETER EXAMPLES 


427 


restricted. For both types of tj’s the maximum Ir]! might be specified to be 
’Imax appropriately adjusting C. For the model rj = y6Csin/ and />0, C 
would be equal to for 0<r„<'?r/2 and for t>’r/2. 

For another example, let rj be temperature, / time, and Q the rate of energy 
input; for some physical conditions 17 = PQt and the maximum tempera- 
ture (t)) is known and is to be attained by adjusting the energy input Q. In 
the following analysis the maximum |7)| is to be for both types of 
models; this is equivalent to prescribing the maximum [2^1 since T7n,ax = 

i)8||2rLx- 

The derivation of a criterion starts with A" which includes the constraint 
of fixed large n for measurements uniformly spaced in /. The problem is to 
maximize A" subject to the constraint of the maximum being equal to 
exactly Let X(t)=Cf (t) where C is to be adjusted to make maxX^ = 

^max- Let be the positive square root of A'^ax- Then Ar^a„=C^^ax 
where /^ax designates the maximum value. We can write A" as 



Observe that (X / X^^^f = {f is independent of C. Then for arbitrary 
values of C (or A'^^J fhe criterion to maximize is 

n- .2 ^(0 

A-=A";r-/,= r-'/"[X-(0] dt; (8.2.7) 

•^0 ^ max 

Although Af*” is indicated to have a dependence only on r, it may also 
depend on t„. For example, for 0</<r„<'n-/2, A' =sin//sin/„ for tj = 
ySCsinr. 

As the first example of the use of the criterion given by (8.2.7), let 
X{t)=Q"’, t>0, 00, and then Ar„aa=CC. Hence Ar + = (//rJ'" where 
m is an exponent equal to or greater than zero. Using (8.2.7), A"'' for this 
case is 



1 

2m -f 1 


( 8 . 2 . 8 ) 


Note that this result is independent of C and unlike the similar case 
treated in Example 8.2.1 where no constraints are used. It is also unlike the 
result of the single constraint of fixed large n; see A4 in Fig. 8.2. The result 
given by (8.2.8) means that there is no unique optimum time r„ for A'= Ct"' 
snd being the same value in each case. 



428 


OUFra* 8 DESIGN OF OPTIMAL EXPERIMENTS 


For the case of A’= Csinr the use of (827) yields 

1 <•/ / sin/ 2— /~*sin2/, _ 

a forO</,<^ 

4sinV ^ 2 


■'sin2/,) forY</, 


(8J9) 


The latier portion of the curve is ihe same as given by in Fig 8-36 
and the 0 to *7/2 portion is shown as the dashed curve in the same figure 
Note that the maximum is unaffected by the constraint of a fixed range of 
7j The same is true for the maximum A* for the A'“ Ceos/ curve (see Aj in 
Fig 8 36) 

Based on the above examples the shape of the A* curve may or may not 
be affected by the ij range constraint Also the location of the maximum 
might or might not be changed 


B2 2 One-Parameter Nonlinear Cases, 

For one parameter nonlinear cases with the standard assumptions of 
IM 1 l-II valid the vananceof the estimator 6 of 0 in ihe model 77<i=ii(^ /) 
IS approximately 

= i AT.’j where AT I (82 10) 

Again the optimal experiment involves minimizing the sum of the squares 
of the sensitivity coefficients As in the linear case the optimal uncon 
strained expenment ivould involve locating all the observations at Ihe 
maximum possible jX] An analysis is given below for cases for which it is 
more practical to use uniformly spaced measurements in / 

The constraints of (1) a fixed number n of equally spaced measurements 
between / = 0 and and (2) a maximum value of [t)! designated are 
to be included in the analysis For large n (8 2 2) permits wnting 


i-J!- ‘d, 

^■'0 0 I'm-'O 




(8211) 
(8 2 12 ) 


where tj„ 


some nominal value of 


which is chosen 


make rj* a 



8.2 ONE PARAMETER EXAMPLES 


429 


function of t„ only and not of t or itself is not a function of 

Note that maximizing A subject to the constraints of fixed n and is 
equivalent to maximizing the term inside the brackets of (8.2.11), which is 
defined to be 


(8-2-I3) 

■'0 

For linear models this criterion is equivalent to that given by (8.2.7). 

A necessary condition to maximize A'*' with respect to r„ is found from 
8A'^/3t„ = 0 which yields 

A-(r)=[2f+(r)f{l + [2r/7,-(r)][^r,„-r(r)M„]} (8.2.14) 

where t is the value of t„ maximizing A^ and A~(t) is defined by 

A-(T) = 'r~' (8.2.15) 

•fo 

As an example of a nonlinear model let be given by 

V = Cexp(-^t) (8.2.16) 

for which it is convenient to make Tjnorn equal to C. Then X becomes 

A + = -)Srexp(-j8/) (8.2.17) 

which has a maximum amplitude ai t* — /3t=\ as shown in Fig. 8.4. For 
this case the maximum tj occurs at r=0 so that t]* is equal to unity. The 





430 


CHAPTERS DESIGN OF OPTIMAL EXPERIMENTS 


location of the maximum defined by (8 2 13) is at /S/= I 691817 Unlike 
linear parameter estimation cases, the possible dependence of on 0 
complicates optimal design in nonlinear cases For the present case this 
difficulty is not as severe as it first may seem, however Notice that i* 
shown in Fig 8 4 though having a unique maximum, has values within 
80'?' of Its maximum for the large range of t* = between one and three 
Hence m this example an accurate nntial estimate of p is not necessary to 
obtain a good experiment design Another approach is to note that since 
the optimal A* occurs when rj/C— 0 184, the duration could be selected to 
maker}/C approximately this value 
Another example which has a model related to (8 2 16) is 


7i = c[l-exp(-/9r)|. (82 18) 


which has the sensitivity A"* of In this example rj mitially 

increaseswith time so that Ji„J^»l~exp(-^/„) Using (82 14) for determm 
ing the optimal duration r gives 


[/lrexi>(-/3r)]^ 

1 + [ 2^r cxp( - j5t) ] [ 1 - cxp( - ^r) ] 


(8 2 19) 


where A~ is the same as A* of Fig 8 4 This condition is satisfied only 
neat t« 0 where 4* • j For small t values n ts approximately Cfit which 
has this A* value [see (8 2 8)) whereas for larger / values A* decreases 
Again it IS observed chat the constraint on the maximum value of |t;| 
changes the optimal conditions 


Example 8.2,4 

Consider again the example of tiie cooling billet investigated in Example 6 2 3 and 
Sections 7 5 2 and 7 9 2 The biHei was healed to the temperature Tq and then 
allowed to cool in open air at a temperature 5“F (301 K) Though several 

models were considered for this billet let «s consider here only Model 1 which is 
[see (7 5 10)] (T— T„)/(To—T„)“exp(-fif) The parameter to be estimated is 
The optimal duration of the expenment « lo found for a large number of 
measurements uniformly spaced in r starting at r^O The initial temperature 
difference Tg - T„ can be sel by simply heating the billet before placing tl in the 
air The air temperature T„ is a fixed known value The temperature T must be 
between T„ and Tg 

Solution 

The model can be considered to be 

T,= r-T„=Cnp(~fit) wheteC=To~r^ 



8.2 ONE PARAMETER EXAMPLES 


431 


This model is identical to (8.2.16) and is also C. Furthermore, Vm ■~’hmax/^nom 
= 1. With the condition of a large number of equally spaced measurements, this 
example is the same as considered for (8.2.16) for which we found the optimal 
experiment to have the duration ts 1.69/;8. This time t can be compared with the 
value actually used. From Fig. 7.13 the 6, value is about 2.7 hr~' and thus 
T= 1.69/2.7^0.63 hr. From Table 6.2 the maximum time is 1536 sec or 0.427 hr; 
this time corresponds to /3t= 1.15 at which time A”*" in Fig. 8.4 is about 80% of its 
maximum value. Hence for estimating /3 in Model 1, the experiment was well 
designed. For the more complicated models discussed in Section 7.5.2, the optimal 
duration may be different. 


8.2.3 Iterative Search Method 

One obvious way to maximize A, A", or A"^ is to plot it versus and then 
observe the value of which maximizes the selected A function. A more 
direct procedure is to linearize in a similar fashion as in the Gauss method. 
Let us illustrate the method by considering A". Assume that a maximum 
exists and that an estimate of the optima! t is r^'‘\ A necessary condition at 
the maximum A" is given by (8.2.5). Expanding both sides using a trun- 
cated Taylor series gives 


[ AT f -h 2 X At(« = [ ] ■ ' f'“[ AT 

•'O 

•^0 

which can be solved for the correction At^*^ to get 








- 1 


(8.2.21a) 


(^) 

^(« = [Ar<«f-[r<«]-' r [Ar<«]^rfr (8.2.21b) 

A few points can be made in connection with the iterative procedure for 
finding the optimal t given by (8.2.21). First, an initial estimate of is 
needed. A reasonable value to use is twice the / value at which 1261 is a 
maximum. This is the value that is found for if one starts at 

corresponding to the maximum value of |Ar|. Second, improved values of t 
are given by 




( 8 . 2 . 22 ) 



432 


CHAPTER 8 DESIGN OF OPTIMAL EXPERIMENTS 


Third, in order to be sure that each iteration helps to increase A", the 
values of A” should also he calculated and compared as one proceeds H A" 
should decrease a smaller Ar should be selected (It is also possible that the 
method is leading to a minimttm rather than a maximum ) Fourth, t cannot 
be negative even though this procedure seeks a maximum in the region 
'-co<T<oo. The procedure based on (8221) is not appropriate if the 
maximum A” occurs at the boundary point t = 0 and 9A*/3r„ is not zero 
there See AJ in Fig 8 2b Finally, the procedure terminates when )AT**’j is 
much smaller than t'** 


83 CRITERIA FOR OPTIMAL EXPERIMENTS FOR MULTIPLE 
PARAMETERS 

83.1 General Criteria 

When there are two or more parameters to estimate, the choice of a 
entenon to indicate the optimal design of the expenments is less straight* 
forward than for the case of one parameter Many entena have been 
proposed They are usually given m terms of X^’X For both linear and 
nonlinear estimation. X represents the sensitivity matrix Recall that the 
covariance matrix of the estimator vector b is (X’^X) for the standard 
assumptions of additive, zero mean, constant variance, independent, nor* 
Rial, measurement errors in Y. additional assumptions are that there are no 
errors in the independent vanables and that p is a constant parameter 
vector with no prior information The value of need not be known 
These assumptions are designated lllll-U For these assumptions OLS, 
Gauss-Markov, ML, and MAP all give the same estimator 

Some of the cnicna which have been suggested in terms of X^X are as 
follows (fl) maximization of the determinant of X^X (or equivalently, the 
maximization of the product of the eigenvalues of X^X) (6) maximization 
of the minimum eigenvalue of X^X. and (c) maximization of the trace of 
X^X These entena are listed by Badavas and Saridis (4], who used the 
second critenon Additional entena are listed on p 52 of Fedorov [3] 
McCormack and Perils (5J used a entenon similar in principle to (c) We 
recommend the first one because it is equivalent to minimizing the hyper- 
volume of the confidence region (provided the assumptions 11111-11 are 
valid) A criterion similar to inax|X^X| was used by Smith [7] as early as 
i9{S The hesc-known carfyworfciiwofwagntaXi'X^X^ was reported by Box 
and Lucas [1] m 1959, however 

Another derivation for the related criterion of maximization of 

|X^if “'X| IS given in Chapter 11 in Nahi’s book [6] (See (8 3 2) below ] The 



434 


CHAPTERS DESrCN OF OPTIMAL EXPERIMENTS 


When the measurements arc uniformly spaced m t between 0 and t„ and 
n IS large, Agy for p = 2 is 

4;T-C,|C„-Ci'„ rXMAiM* (83 5) 

•m •’0 


where X,{t) is the sensitivity coeffiaent for parameter / and time / The 
extension to ;i>2 is direct If, in addition, there is a constraint of the 
maximum ij being specified, one can modify (8 3 5) by replacing the 
integrals by a typical expression of 


where 


C (O-V/ 


X* 


A 

- 0 -. • 


(8 3 6a) 

(8 3 6b) 


which are similar to the expressions given in Section 8 2 2 Then for two 
parameters with the constraints of large n with uniform spacing in t and 
the maximum rj being we have the criterion of maximizing 


i*-c,Tc,;-(c,;)’ (83 7) 


If there are multiresponscs in the expenmeni and measurements are 
taken with uniform time spacing starting at i«»0, another d* criterion 
must be given As above, the symbol d* means that the constraint of the 
same n range is included m addition to the constraint of uniform dr 
Examples of multiresponsc cases involving transient temperature measure 
ments at more than one position are studied in Section 8 S Let m denote 
the number of independent responses This case can be treated by extend- 
ing the definition of C* given by (8 3 6a) to 

C,; -«) jf'X' (!.<»)* (8 3 8) 


where is used to designate the Ath response By defining C* m this 
manner, d'*’ is unchanged in value if m sensors are located at the same 
position (or measure Che same quantity) 


8.3.2 Case of Same Number of Measurements as Parameters {n=p) 

One possible multipaiameter case is when the number of measurements 
and parameters are equal Without pnor information the minimum num- 
ber of measurements n needed to esUiaats p parameters is it=p In this 



83 OPTIMAL EXPERIMENTS FOR MULTIPLE PARAMETERS 


435 


case X is a square matrix. This results in the following simplications for 

^ST> ^ML’ ^OLS’ 

ixp 

^ST~I^P’ ^ML ~ ^OLS “ (8.3.9) 

Note that the same criterion is given for both ML and OLS estimation for 
the assumptions of 11—1011. Also observe that the optimal choice of X 
elements are affected by the accuracy of and correlation between the 
measurements. 

Consider now the criterion of maximizing which is equivalent to 
maximizing the absolute value of 

^.2 ••• 

ixi= ; ; ; (8.3.10) 

^.1 ^.2 ••• 

since when X is a square matrix, A=|X^X| = |Xp. In the remainder of 
Section 8.3.2 let |X| denote the absolute value of the determinant of X. 

As mentioned above there usually must be constraints on the range of 
operability (a term used by Atkinson and Hunter [8]). Let R (f) define the 
region of operability; the f vector has elements that can be illustrated by 
Writing the linear model as t 7 = /?i/i+-- - + Ppfp- However, not all the 
values of may be attainable, as, for example, when /i = l. Let those 
values which are available for experimentation define the attainable region 
^(x), a subspace of the p-dimensional X space. The design problem then 
becomes that of selecting n points in R (x) which maximizes |X|. Atkinson 
and Hunter [8] have shown that the value of the determinant given by 
(8.3.10) is proportional to the volume of the simplex formed by the origin 
and thep experimental points. Thus an optimal design is one for which the 
simplex volume is maximized. It follows, then, that for an n=p design to 
be optimal, the experimental points must lie on the boundary of R (x). 

8.3.2.1 Linear Examples for p~2 
Constraints o/ 0 < /, < 1 0 < /j < 1 

Consider the simple linear model 

2J = y8,/, + /?2/2 (8-3-10) 

'vith the constraints 


/ 


0 < /, < 1 and 0 < /2 < 1 



43< 


CHAPTER S DESIGN OF OPTIMAL EXPERIMENTS 


R(f) for n=8^fj+fl2f2 
(operability region) 

V_ R(jg for n*8^+S2^2 
(att afnabHIty region) 

1 Xj 

Figure 8-5 Regions ol operabibiy and attainability (R(0 and R{x)] for constraints of 
0< /i<1 and 0< /jCl 

Thus ihe region of optrabihly R(I) is the «mt square shown in Fig 8 5 
For this case the sensitmty coefficients are A’,-/, and and the 

absolute value of the determinant of X ts 

(83 11) 

where the vertical bars on the right side mean absolute value For the 
model T)" + /3j/j, the attainable region R(x) is the unit vertical line at 

Xi » 1 shown in Fig 8 5 From the geomctncal interpretation of max|X|. 
the optimal design for two experiments consists of 2 points in R (x) which, 
together with the origin, form a triangle of greatest area For this case the 
optimal two points are the extremes of the line, (A'j.,Yj)-(l,0) and (1,1) 
For this design IX|*=I If the attainable region R(x) happens to be the 
operability region R (f) which is the unit square, an infinity of designs give 
the same maximum value of the determinanL namely, one expenment at 
(Xi,Ari) = (0, 1) and the other anywhere between and including (1,0) and 
(1, 1) or one at (1,0) and the other anywhere between and including (0 I) 
and (1,1) All these designs also give a |Xj value of unity 

Constraint n/ 0 < ij < 

Frequently a more realistic constraint than on the/, values is on the range 
of 1? In this section the case of is investigated for n=p = 2 and 

the linear model Three different vanatioas of this case are considered 

Case 1 

In this case there are no constraints on the /’s so that /?(!) is equal to R(x) For the 
model 1) * /i + Pill the oplimal design points are found from 

maxlXl^maxIATiiYB— A'j,V|2i“inaxl/,,/22— /ji/tjl 



(8 3 12) 



83 OPTIMAL EXPERIMENTS FOR MULTIPLE PARAMETERS 


437 



Figure 8.6 Several regions of operability and attainability for constraint of 0 < i) < VmaK- 


while satisfying the condition 

maxT) = r)„^^=max| /?,/, + (8.3.13) 

The region R (x) is the triangular region bounded on one side by the line 
determined by varying/, and /2 in (8.3.13); see Fig. 8.6a. The largest |X| value is 
found by the two points and the origin comprising the largest triangle in R (x). In 
this example the optimal conditions are (j3iA',,/?2^2) = (’7max>0) and (0,T)„a^). This 
results in max|X| being equal to Irjmax/i^i /521- 


Case 2 

In this case the operability region R (f) is greater than the attainable region R (x). 
As an example consider the model t] — P\ + for which R (x) is the vertical line 
at ji^Xx = fix shown in Fig. 8.6b. Hence the two extreme points along R (x) together 
with the origin form the maximum triangle. The maximum jX] is Krimax— Pfi/ ~ 
maxl/jl, which is made larger by increasing maxi/ 2 j. 


Case 3 

In the last case tj is given by 


V=^C(fixf{+fi2fi) 

where C is adjusted to make max|T]| = ->]„a,. In symbols, C is 

~ max|^,/,' + ,82/^1 

Then the maximum jXj value is 


(8.3.14) 


(8.3.15) 


maxlXI = max(C Vi'i fii-fix Ail) = 


nLxmax|/i'i/j2-/2i/i2l 
ma\i fix fx + fi^Af 


(8.3.16) 


In order to illustrate this expression, consider the case of 1?= C(/S| + ,82/^) for 
which /;,=/j, = l. If we further choose/^ to be equal to or greater than/,'2>0, the 



438 


CHAPTER 9 DESIGN OF OPTIMAL EXPERIMENTS 


ma’‘l/ii/22-/2i/i2l value u/a (by s«ting/„*0) and ntatlXI is 

(83 17) 

which IS similar to that given for case 2 Note that row the maximum of max|X| is 
not simply given by the maximum value of /u Rather differentiating (8 3 IT) with 
respect to ^2/22 atid setting the equation equal to zero gives 

(83 18) 

which are then both equal to Then wc find max (max|Xl) to be ■i}„,,/2^2 

or equivalently (Much smaller maxjXl values arc found for certain 

other values for example it goes to zero for both /J2/22 approaching zero and 
infinity ) The optimum two measurement points are shown in Fig 8 6c and are 
(^i-'^’i 0) «.nd (q^/2 q^./2). 

8 J 2 2 Nonlinear Example for p s2 

A model studied first by Box and Lucas (1] and later by Atkinson and 
Hunter [8J is next considered Preliminary estimates of » 0 7 and ft = 0 2 
yield the model and sensitivities of (see Problem 8 10) 

jj“14[exp(-02r)-exp(-07/)j (83 19a) 

^,X,**07[(08+1 4i)exp(-07f>-08exp(-02i)] (83 19b) 

ftA'j - 0 2 [(2 8 - I 4/)exp( - 0 2r) - 2 8 exp( - 0 7i) ] (8 3 19c) 

which are plotted in Fig 8 7 The operability range of ij is between 0 and 
0 6, X^(t} and A'jfO are also finite but may be negative as well as positive 





83 OPTIMAL EXPERIMENTS FOR MULTIPLE PARAMETERS 


439 



Figure 8.8 K(x) for Box and Lucas [1] example. (Printed by permission of the Biometrika 
Trustees.) 

Note that and Xj are uncorrelaled and have maximum absolute values 
at different t values. Plotting X 2 versus A', as in Fig. 8.8 provides the 
attainable region R (x) which is a curved line in this case. 

The points of the optimal design, shown in Fig. 8.8 by heavy dots 
labeled A and B, together with the origin, form the triangle of maximum 
area within R (x). Associated values of t are 1.23 and 6 . 86 , which values are 
affected by the choice of the parameters. Since the /I’s are not precisely 
known when the experiment is designed, one might wish to relate these 
values to associated measured rj values. For example, at / 2 = 6 . 86 , 77 has 
reduced to \ of its maximum value. 

Atkinson and Hunter also studied optimal designs for up to 20 measure- 
ments. For these cases A" is given by 


(8.3.20) 




CHAPTER 8 DESIGN OF OPTIMAL EXPERIMENTS 


Their results for maximum values of A" are given m Table 8 1 In each case 
the optimal design is found to consist of measurements solely at the two 
times indicated above When n is even, equal numbers of measurements at 
each point maximize A" For odd n an extra measurement at either of the 
two conditions give the same maximal A" 


Table 8.1 Optimal Designs for up to 20 

Measurements tor the Box and Lucas 
Model Given by (8J.19)* 

Number of 


2 

3 

3 


6 

10 

20 


mcasutements at maxHnum 


f-123 /=6 86 A- 


I I 0 1642 

1 2 0 1459 

2 I 0 1459 

220 1642 
230 1576 
320 1576 

3 3 01642 

5 5 01642 

10 10 0 1642 


•Reprinted by permission from Technometrics [8] 


Noted by many is the conclusion that mp optimal conditions for de- 
termining p parameters consist of m repeated optimal expenmenl How 
ever, this conclusion is not always valid, as pointed out in [8] 

Note that to obtain the 20 measurements in Table 8 1 10 different 
expenments must be run Because it is wasteful to disregard data at other 
times when the transient experiment has been performed, the emphasis in 
this chapter is upon many equally spaced measurements 


8 4 ALGEBRAIC EXAMPLES FOR TWO PARAMETERS AND LARGE n 
8.4.1 Linear Model i? = + ^zSin/ 

To illustrate the case of a large number of uniformly spaced measure- 
ments, consider the model 

+ 0<f<^ (84 1) 

Assume that the assumptions of additive, zero mean, independent errors 



8.4 ALGEBRAIC EXAMPLES FOR TWO PARAMETERS AND LARGE n 


441 


apply with the others denoted in 11111-11. No constraints are to be 
included for v or t„. 

For this case the optimal value of is found by maximizing (8.3.5). The 
sensitivity coefficients are A", = 1 and Ar2 = sin/ and the Q values are 

= l <^22=^-^sjn2/„, C,2=-^(J-cos/„) (8.4.2) 

These expressions are plotted in Fig. 8.9 along with A". The optimal is 
5.5 which is considerably larger than 2.25, the optimal for estimating 
only jSj. 

8.4.2 Exponential Models with One Linear and One Nonlinear Parameter 

Exponentially decaying solutions commonly occur in science and engineer- 
ing. One is 

7) = /3,exp(-/?20 (8.4.3) 

This could describe the temperature in a fin (Section 7.5.1) or that of a 
cooling billet (Section 7.5.2) [T^ would be assumed known in (7.5.4) and 
(7.5,10).] For the assumptions denoted 11111-11 and no constraints on 77 or 
t the criterion to maximize again is A”, given by (8.3.5). The sensitivities are 

where = Functions similar to A', and A'2 are shown in Fig. 8.4. 



Figure 8.9 Sensitivity curves for the model ij = )S] + TSjSinf. 




442 


CHAPTER 8 DESIGN OF OPTIMAL EXPERIMENTS 



If measurementi were desired at only two locations the optimal loca 
tions arc ac “0 and 1 the former being where X] is a maximum and the 
latter where is One can demonstrate this by plotting versus X^ as 
tn Fig. S S and then finding the maximum triangle including the origin If 
only two measurcmcjiis are to be taken from each experiment in a senes of 
experiments the measurements should be made at just these two times in 
all the experiments 

The integrals C, associated with X, and Xj are plotted in Fig 8 10 along 
with i” A large number n of equally spaced observations in 0 < t < is 
used The optimal duration of an expenment for determining both ji, and 
/?2 IS the time at which A" is a maximum = I 191 This maximum occurs 
between the limes of the maxima of t* =0 for Cn and t* = 1 69 for Cj2 
These latter times are the optimal values if only jS, and only were to be 
estimated 

A model similar to (8 4 3) is 

T,-/!,[l-<ap(-ft»)l (845) 

This could represent the same physicsd cases mentioned above except now 
To IS assumed known Doth models are illustrated on the following page 




8.4 ALGEBRAIC EXAMPLES FOR TWO PARAMETERS AND LARGE n 


443 



Figure 8.11 Sensitivity curves for the model i) = ^i(l — with no constraint on maxi- 
mum Tj. 


Sensitivities for (8.4.5) are 

Ar, = l-exp(-/ + ), + (8.4.6) 

where t'^ = j3t. The integrals C,^ and A" are depicted in Fig. 8.11. The 
maximum of A" is at = 7. 1 84 which is between the value of = co and 
1.69, maxima values for C,, and C 22 . 

It is significant to note that at time = 1.191, the optimal value for 
model (8.4.3), the A" value shown in Fig. 8.11 is still very small. Hence an 
experiment design that is optimal for (8.4.3) is very poor for the similar 
exponential model, (8.4.5). 

Example 8.4.1 

Consider again the cooling billet example studied in Example 8.2.4 and other 
sections. The model can be in the following forms 

T"- 7’„ = (7’o- - Pi) 

^o=(7’„-7’o)[l-exp(-/Sr)] 


(a) 

(b) 




444 


CHAPTER 8 DESIGN OF OPTIMAL EXPERIMENTS 


Consider two cases For tht first case, (o) assume temperature r„ is accurately 
known and {Tq— and /S are (he parameters This descnbes the billet problem 
because is accurately known The second case corresponds to (6) for which the 
initial temperature To is considered to be known and (T^ - Fp) and ^ are now /S, 
and /Jj respectively The optimum durations for both cases for a large number of 
equally spaced measurements from 0 to /* arc to be found No constraints on 7" or t 
are to be used Assume that the measurements satisfy the standard conditions 
denoted 1 1 1 1 l-ll An estimate of ^ in (<r) and (^) is 2 7/br 

Solution 

For (a) the dependent variable can be considered to be T-T^, this model is 
similar to (843) and the optimal duration is /^ = 1 191/^ss I 191/27=044 hr See 
Fig 8 10 For (h) the dependent variable is T-T^ which is analogous to ij of 
(8 4 5), from Fig. 8 11 the optimal duration is r,=7 185/^ » 2 66 hr The duration 
of the optimal experiment is relatively long when is unknown in fact, at 
t* =7 185, T= 19992 compared to the value of 200 which is approached as r-»oo 


8,5 OPTIMAL PARAMETER ESTIMATION INVOLVING THE PARTIAL 
DIFFERENTIAL EQUATION OF HEAT CONDUCTION 

To illustrate design of optimal expenmenis in more complex cases, studied 
next are cases involving the partial differential equation of heat conduc- 
tion Considerations not encoumered m the algebraic models given above 
enter when the model involves this equation For example, space as well as 
time dependence is met Thus m addition to finding optimal duration of 
experiments optimal locations of sensors are needed Furthermore, the 
response at any location is affected by the prescribed time variation of 
boundary conditions Another significant aspect of estimation involving 
partial differential equations is (hat the parartieters can be present tn the 
equation and/or m the boundary conditions 

The entena derived in Appendix 8A apply to estimation involving 
ordinary and partial differential equations For simplicity, the cases con- 
sidered in this section were selected because they have solutions m terms of 
known functions, similar methods of analysis can be used, however, even 
if the equations must be solved numerically as commonly occurs for 
nonlinear differential equations 

The entenon utilized is that of maximizing A=|X^X| subject to ap- 
propriate constraints This is the condition to employ when the standard 
conditions denoted lllIMl apply When many transient measurements 
are obtained using a single sensor the standard assumption of independent 
measurement errors may not be valid If the correlation parameters are not 



8.5 OPTIMAL ESTIMATION FOR PARTIAL DIFFERENTIAL EQUATION 


445 


known, however, it is still reasonable to choose the maximization of |X^X| 
as the criterion. 

The transient heat conduction equation for heat flow in a plane wall 
with constant thermal conductivity k and density-specific heat product c 
can be written as 


0x2 ^ 9t 


or 


ar 

“ 3x2 ar 


(8.5.1) 


where a = k/c is called the thermal diffusivity. The differential equation 
can be written in terms of the single parameter a. but sometimes there are 
boundary conditions which involve k. In the following analyses when only 
a appears, it is the parameter, but when k appears in boundary conditions, 
k and c are the parameters. The parameters k and c are chosen because of 
their physical significance although others can be used as indicated in 
Section 7.10. 

For the standard assumptions 11111-11, a fixed large number of equally 
spaced observations and a constraint on the maximum range of tj [which is 
the increase in T of (8.5.1)], the criterion for one parameter is to maximize 
L* given by (8.2.13). If the same conditions are valid for two parameters, 
A"*" is given by (8.3.7, 8) where the i and j subscripts could be 1 and 2 with 
the subscript 1 referring to k and the subscript 2 to c. 

Several examples are given in this section. First considered are semi-in- 
finite bodies for which the body starts at x = 0 and continues indefinitely in 
the plus X direction. Although such bodies do not exist in nature, many 
heat-conducting bodies can be so modeled, at least for some period of 
time. Also considered are finite bodies. Temperature measurements in a 
finite plate heated on one side and insulated on the other are tabulated in 
Table 7.14 and illustrated in Fig. 7.17. These measurements also illustrate a 
semi-infinite body; until time 6 sec, the temperatures in the plate are the 
same as those that would be measured if the plate were thicker. 


8.5.1 Serai-Infinite Body Examples 

8.5.1. 1 Temperature Boundary Condition (Single Parameter) 

Suppose that the temperature in a semi-infinite body is initially uniform at 
the temperature Tq. Let the temperature at x = 0 have a step increase to 
Too. The temperature in dimensionless form can be given as [9] 

00 •' 0 ^ •' x2 


(8.5.2) 



446 


CHAPTER 8 DESIGN OF OPTIMAL EXPERIMENTS 


where erfc(z) is called the complementary error function and is the 
integral, 

(8 5 3) 

Note that although 7" is a function of x and /, the dimensionless tempera 
ture can be plotted in terms of the single dimensionless variable i* For 
temperature boundary conditions involving the heat conduction equation 
(8 5 1), the only parameter that enters is a (if temperatures and r„ are 
not parameters) Note that 7“ is a nonlinear function of a Thermal 
diffusvvity (a) is also called a “property" and has been estimated for many 
materials by many different expenmentahsts, some of whom have used 
(8 5 2) as their model 

The solution given by (8 5 2) has a natural constraint on the range of 
temperature T because T must be between and Even though at 
some interior location x and at some time / the temperature may be much 
less than r„> the tecnperaturc near jc-O approaches Instead of 
requiring the temperature at x to reach the same maximum value at the 
end of iheexpenment we apply the constraint at the heated surface (a - 0) 
where the temperature rise is the greatest Hence the “nominal" rise in T is 
taken to be - To 
The dimensionless a sensitivity is 

The A* function for a large number of uniformly spaced measurements 
starting at r = 0 and for the maximum T in the body being is 

<5* -(C) '/'V.* )'-'C (8 5 5) 

•'o 

Note that A* is a function of t* the maximum time in A* 

Plotted in Fig 8 12 are r* and A'a* versus r* and d* versus Fora 
given location x for measurement of temperatures, the sensitivity X* has a 
maximum at i* = a(/x^ = al/x^=05 at which time 7* =0 3173 Hence if 
only one measurement is to be taken from those produced by one sensor, it 
should be selected at a lime corresponding to i* =0 5 if instead the time 
of measurement is fixed but any one location is to be selected, then the 
optimum X IS {2«/)'^^ If many equally spaced in time measurements are 
used, the optimal duration for using data is when A* is maximixed, u is 
time 1 * = 12 (when T*s05) If a good estimate of a is not initially 
available, the optimal times can be estimated using the corresponding T* 
values indicated 






448 


CHAPTER 8 DESIGN OF OPTIMAL EXPERIMENTS 


8 5.1.2 Constant Heat Flux Boundary Condition (Two Parameters) 

If a flat electric heater is affixed to the surface of a large body and a 
constant current is passed through the heater, the surface heat flux into the 
body IS constant The surface temperatUTe will respond m a similar manner 
from 3 3 to 12 sec as that shown m Fig 7 17 If the body is semi-infinite 
with an initial temperature T(, and is subjected to the constant heat flux q, 
the temperature response can be written as 

ierfc( 2 ) = ff " '^exp( - z*) -z erf c(z) (8 5 7) 


where is again at/x^ In this case T is a nonlinear function of the two 
parameters, k and c (since a » k/c) Another combination of parameters i$ 
a and *■*, in this case T ts nonlinear m a but linear m /f ' [See (7 10 11) ! 
Dimensionless sensitivities for the parameters k and c are 


* ~qx/k 9c " 


(8 5 9) 


[Venfy that the relation given by (7 109) is satisfied by (8 5 6), (8 5 8), and 
(8 5.9) 1 These two sensitivities are depicted in Fig 8 13. X* starts positive 
and goes negative whereas is always negative and larger in magnitude 
At the time that goes to zero, the temperature T is insensitive (le, 
unchanged) by small changes in k One significance of X^ being larger in 
magnitude Chan A","^ is that, if only k ot c were to be estimated, there 
would be on the average less relative uncertainty in c than k 
It vs also instructive to evaluate T and the sensitvvitvcs at the surface 
(jt = 0), we get 

(8 5 10 ) 


kdTiQ.t) / , \'/j car(o,/) 

9lc ^\A:c7r/ 9f 


(85 11) 


Since the two sensitivities at x=0 are proportional, A is equal to zero and 
measurements at x*=0 alone, no matter how accurate, cannot permit the 
independent estimation of both parameters 



8.5 OPTIMAL ESTIMATION FOR PARTIAL DIFFERENTIAL EQUATION 


449 



Figure 8.13 Sensitivities and A'j'*' for k and c for semi-infinite body with 9 = constant. 


Because the sensitivities for x>0 are not proportional as shown in Fig. 
8.13, any interior location can be used to provide data for estimating k and 
c. Not all locations or durations of the experiments are equally as effective, 
however. In order to find a meaningful optimal experiment, a constraint 
for the temperature rise is needed because as shown by (8.5.10) T goes to 
infinity as t increases without limit. From physical considerations only a 
finite maximum temperature is possible (materials melt or vaporize). 

The constraint of the same maximum temperature rise can be introduced 
using (8.3.6, 7). Let be equal to qx/k] q is analogous to the adjustable 
constant C in Section 8.2. The quantity in (8.3.6b) is the maximum 
rise of T^ax~ thus also in (8.3.6b), is (T^ax~ T^/{qx/k), which 
from (8.5.10) is 




T —T 

max ^ 0 

qxjk 


qx/ k 


n 


1/2 


kt„ 


cx 


= 2(C/2r)'/" (8.5.12) 


The maximum temperature which occurs at x = 0 and at time is made to 
be the same in each case by appropriately adjusting q. (The .v given 
explicitly in (8.5.12) refers to the location x:>0 of a sensor.) 

A plot of A defined by (8.3.7) versus for one interior measurement 
yields a maximum A+ value of 0.000167 at C = «f„/x^ = 8.5. Again this 
results can be interpreted in two ways. First, for a given location of the 
temperature sensor, say, at a- = 0.02m in an iron block (a = 2x10“" 




CHAPTtll* DfSlGN OF OPTIMAL EXPERIMENTS 


mYsec), the optimal duration is /,=8 5 x^/a^MO secs Second, for the 
same example if the optimal duration were desired to be 170 secs, then the 
sensor should be located 0 02 m from the heated surface 
U IS instructive to study the case of two sensors, each producing equally 
spaced, independent measurements starting at r = 0 If two thermocouples 
are located at the same x, the use of C* [defined by (8 3 8)j in (8 3 7) 
would give the same optimal value of A* If a search is made for the 
optimal two locations, they are found to be at at = 0 and at any Ar>0 so 
that t* = 1 5, the associated A* vahie is 000263, which is almost 

16 times the maximal value mentioned above for one sensor Hence a 
design involving two sensors positioned as indicated would result m much 
greater accuracy in the estimates of k and c than if only a single sensor 
were used or if two were used at the same x 

S.5d 3 Heat Flux Boundary Conduton to Cause a Step Change in 
Surface Temperature 

Temperatures inside the semi*infimte body change most for a given tem- 
perature range when the heated surface takes a maximal step increase 
Both k and c can be estimated if this change in temperature is caused by a 
presenbed heat flux (If the surface temperature is the specified boundary 
condition only a can be estimated See Section 8 511) When the temper- 
atures change most, the sensitivity coefficients would also be expected to 
be greatest in magnitude [See (7 109) for a relation between T, ZT/bk, 
and ZT/Zc\ We would anticipate for this reason that this case may have 
the optimal heat flux boundary condition 
A surface heat flux having the time dependence 

q~a(nr''‘ (8 5 13) 

produces a step rise in surface temperature of — To The constant a is 
related to T„ - Tq by a = {kcy^\T„— The temperature distribution [9} 
and the k and c sensitivities are 

T(x,i)~To ^ 

= - = ^ (85 14) 

Y* ^ SB _ — ft l 

il=(i7/*)’''’exp(^) 


(8 5 16) 



8.5 OPTIMAL ESTIMATION FOR PARTIAL DIFFERENTIAL EQUATION 


451 


In Fig. 8.3a, is the X-^ curve and Cjl is the A" curve in Fig. 8.3Z>. 
Because the above case has a limited range of T, a constraint on T is 
incorporated in the solution. 

An optimal location for one sensor again cannot be for x = 0 as the 
sensitivities are proportional there. Optimal for one sensor occurs at 
// = 10 at which time A"*" is the maximal value of 0.00232. If two sensors 
are optimally placed, they are at jc = 0 and at the x corresponding to 
= at„/ x^= 1.25 where A"*" is the much larger value of 0.01 13. Again two 
sensors located as indicated are much more effective than one. 

8.5.1.4 Summary of Optimal Designs for Semi-Infinite Bodies 
Subjected to Heat Flux Boundary Conditions 

A summary of results for the heat flux boundary condition is given in 
Table 8.2. Cases 1 and 4 are for a single sensor at x = 0; precise measure- 
ments at only that location cannot be used to estimate independently k 
and c. However, if only /c or c is estimated, x = 0 is the optimal location. 
Also given are cases 2 and 5 which are for a single sensor at x>0. The 
optimal results are given by cases 3 and 6 for two sensors. 


Table 8.2 Summary of Maximum Values of A"*" for Semi-Infinite 
Bodies with Heat Flux Boundary Condition. A"^ and 
the C,j are Normalized to Contain the Same Number 
of Measurements in Each Case 




Location 


Time of 

Components 


Boundary 

of 

Maximum 

Max. A"*", 

of Maximum A"*" 

Case 

Condition 

Sensors 


C = at„/x^ 

^11 



1 

? = const. 

x = 0 

0 

— 

0.125 

0.125 

0.125 

2 

q = const. 

x = x>0 

0.000167 

8.5 

0.0181 

0.1119 

0.0431 

3 

9 = const. 

x = 0,x 

0.00263 

1.5 

0.0631 

0.0981 

0.0597 

4 

q for7’=r„ 

x = 0 

0 

— 

0.25 

0.25 

0.25 

5 

II 

Ux 

O 

x = x>0 

0.002317 

10.0 

0.0585 

0.2325 

0.1062 

6 

II 

t-i 

o 

x = 0,x 

0.0113 

1.25 

0.1275 

0.2003 

0.1192 


The covariance matrix of the estimated parameter vector b having 
elements k and c is given by (X^X)“'a^ provided standard assumptions of 
additive, zero mean, constant variance, independent normal errors apply 
(more specifically, assumptions denoted 11111-11). Then for n being the 


CHAPTER 8 DESIGN OF OPTIMAL EXPERIMENTS 


total number of measurements, the covanance of b is 



(8 5 17) 


kc cW 

£iz £1! 

kc ki j 


(8 5 18} 


Values of Cy*’s are given in the last three columns of Table 8 2 We can 
use them, for example, to give the approximate standard deviation of k as 



(8 5 19) 


The second factor in (8 5 19) can be considered to be relative measurement 
error m the temperature and the factor with ihe square root is an amplifi* 
cation factor for the conductivity The smaller the amplification, the more 
precise are the k estimates For /i“25 the amplification factor is 5 2 for 
case 2 and 0 84 for case 6 This corroborates that larger values of result 
in experiments that permit estimating parameters with greater accuracy 
Another use for expressions such as (8 5 19) is in determining the number n 
of measurements needed for specified accuracy 

Conclusions that can be drawn from Table 8 2 for estimating k and c are 
as follows 


1. A single sensor at is not permitted 

2. V/hen one sensor at x¥>Q is used the optimal time r* is about 10 for 
both heat flux boundary conditions 

3. When two sensors are used, one should be at Ar*0 and the other at 
^ > 0 Note that the optimal conditions for one sensor are nor repeated 

4. The heat flux condition causing a step change in surface temperature 
(cases 5, 6) is much superior to the constant flux condition cases 2 and 
3 

5. The optimum of the optimal designs given in Table 8 2 is case 6 
Hence, when k and c are esUmated in a semi-mfinite body, this would 
be the recommended design It can be shown (13j that if more than 
two sensors are to be used, about half should be placed at x = 0 and 
the remainder at x = (at„/i 25)*^^ 



S5 OPTIMAL ESTIMATION FOR PARTIAL DIFFERENTIAL EQUATION 


453 


6. The number of optimal conditions can be less than, equal to, or more 
than the number of parameters. For a given heat flux boundary 
condition and one sensor, is maximized only with respect to t^. 
Also for given q(t) but with two sensors, is maximized with respect 
to two parameters relating to the location of the parameters. Finally 
for arbitrary q{t), A"^ can be maximized by varying the function q{t) 
which involves an infinite set of functions, two of which are illustrated 
in Table 8.2. Of all these possible functions none can yield larger A’*' 
values for semi-infinite bodies than the heat flux function of cases 5 
and 6. 


8.5.2 Finite Body Examples 


8. 5.2.1 Sinusoidal Initial Temperature in a Plate 

Consider for the first finite body example the case of a plate which has a 
sinusoidal initial temperature and zero temperature boundary conditions, 

T{x,0)=T„sm(^), r(0,0 = 0, T{L,t) = 0 (8.5.20) 

The solution of (8.5.1) with these conditions is 

T(x,/)=r„exp(-7r2r")sin(^), (8.5.21) 

Again for temperature boundary conditions, only the thermal diffusivity a 
appears — not k and c independently. The dimensionless a sensitivity is 

^ T i: I? z ) 

This expression has maximal magnitude at x/L = 0.5 and = I (replace 
t in Fig. 8.4 by to see the f’’ dependence). Consequently if only one 
sensor location is chosen, it should be at x/L = 0.5. Further, if only one 
time is selected, it should be at t 

Since the range of T is constrained to between 0 and T^, the maxA"^ 
criterion is appropriate for n equally spaced measurements starting at / = 0 
{n is large”). Using (8.2.13) with = T ^/ = 1, an expression for A'^ is 
given. Necessary conditions for a maximum are 


3A+ 

K 


= 0 , 


9A+ 

0(x/L) 


= 0 


(8.5.23) 


Using Fig. 8.4 the optimal duration is tp = at^/ L^= 1.691817 /tt^; the 



454 


CHAPTER 8 DESIGN OF OPTIMAL EXPERIMENTS 


optimal x/i. IS 0 5 as for one measurement Note that though there is only 
one parameter (namely a) L* is maxinu 2 ed with respect to two vanables 

We can also locate optinml positions for two sensors In this case of one 
parameter is given by C"^ as defined by (SJ 8) with m = 2 t 5 *=l and 
i=j — l A* IS maximiaed by putting both sensors at x/L = 05 

8522 Constant Heat Flux at x=>0 Insulated at x’^L 
A case permitting the two parameters k and c to be estimated is a plate 
exposed to a constant heat flux q on one side and insulated on the other 
Mathematically this problem is described by (8 5 1) and 


The dimensionless lemperature (9) is 


+ 2 -Lb cosnM* (8525) 

nml n 


■whertT* e‘{T-T(^/{qL/K) x*^xlL andr^-'ot/L* In Figures 8 14 
IS and 16 the dimensionless temperature and k and c sensitivities are 




8.5 OPTIMAL ESTIMATION FOR PARTIAL DIFFERENTIAL EQUATION 


455 


plotted versus t* for various positions in the plate. and X 2 are 
determined in Problem 8.11). After an initial period, T'^ and -X^ 
increase linearly with time whereas A", approaches various constant values 
including zero. Since Xj goes to zero near x / L = 0.5, this is a poor location 
for a temperature sensor in this case. 

Suppose that both k and c are to be estimated using many equally 
spaced measurements. Assume that the standard assumptions denoted 
11111-11 are valid. Since T increases without limit as I-^oo, a constraint is 
needed. The A"'' criterion given by (8.3.7) can be used with CX defined by 
(8.3.8) to include this constraint. The term 7 )^ is {T„ — T^/^qL/k) which 




Figure 8.16 Dimensionless sensitivity for 9 = C at x = 0 and 9 = 0 at a: = L. 





4S6 


CHAPTER 8 DESIGN OF OPTIMAL EXPERIMENTS 


IS given by (S 5 25) evaluated at x* —0 and notice that i)* is a function 
of only t* By using in this manner the maximum temperature T„ is 
made to be same value for each duration 
Consider first the case of a single sensor The optimal location is at ^ = 0 
and the optimal duration for taking uniformly spaced measurements is 
/ * = 1 2 See case I of Table 8 3 This location is suggested from an 
inspection of Figs 8 15 and 16 because the magnitude of the k and c 
sensitivities are largest at jc=0 Their magnitudes were also largest for the 
semi infinite body but we found that a single sensor at x = 0 for the 
semi infinite body would nof permit both k and c to be estiinated The 
difference between the two cases is that though the k and c sensitivities are 
proportional [see (8 5 11)J at the heated surface of the semi infinite body 
they are not proportional at x=»0 for the finite body (since X* approaches 
a constant and - increases with time) It does happen that the k and c 
sensitivities at x**0 for the finite body are nearly proportional until time 
/*-03 clearly A* must have a maximum at a larger time than that 


Table 84 Summary of \faxlmum Values of A* for Finite Bodies 
Insulated on One Side 


Case 

Boundary 

conditioni 

ais»0 

Location of 
Temperature 
Sensors 

Maximum 

A* 

Time of 
Maximum L* 

1 

9=coMiai)i 

x«0 

000098 

12 

2 

^ “constant 

x-Z. 

000019 

13 

3 

9 “constant 

x“0and L 

OOOS88 

0 65 

4 

q for7’“T^ 

x=0 

00291 

18 

5 

q for T= 

x“0 and L 

00358 

0 76 


Two additional optimal cases for T given by (8 5 25) are listed in Table 
8 3 Case 2 is for a single sensor at x=L Case 3 is for two sensors 
optimally located of all possible two locations the best arc at x = 0 and L 
If more than two sensors arc used the optimal design is approximated by 
having m/2 sensors at x=0 and m/2 at See Problem 8 13 

Recall from the way A'*' is defined that having a multiple number of 
sensors at the same location does not change the A"^ values Notice that 
A* of case 1 is about one sixth of A* for case 3 Hence the use of one 
sensor at x = 0 and another atx=£. is much more effective for accurately 
estimating k and c than placing bofli at x=0 
In addition to optimal expenment duiattons and optimal sensor loca 



85 OPTIMAL ESTIMATION FOR PARTIAL DIFFERENTIAL EQUATION 


457 


tions, optimal boundary conditions could be sought. The optimal heat flux 
boundaiy condition at x = 0 is a heat flux history which causes the surface 
temperature to take a step increase to the maximum temperature. Cases 4 
and 5 in Table 8.3 are for this boundaiy condition. Notice that A'*' of case 
5, which is for measurements at x = 0 and L, is the largest of all those listed 
in Table 8.3. A still larger value is found if an optimal boundary condition 
at x = L is used [10]. 

In Tables 8.2 and 8.3 a number of optimal experiments are given. If we 
have the freedom to choose (1) the location and number of the tempera- 
ture sensors, (2) the time variation of the heat flux^ and (3) the geometry, 
an optimal experiment of those listed can be selected. In each case the 
decision is simply based on the size of A'*', with the largest values being 
best. Notice for comparable heating conditions and locations of sensors 
that the plate insulated at x = L is always better. One could continue this 
search by modifying the insulation boundary condition and by investigat- 
ing other geometries such as cylinders and spheres. 

8.53 Additional Cases 

Applications of the optimal criteria for various ordinary and partial 
differential equations are unlimited. The purpose of this subsection to 
provide more references. 

Some analyses of optimal experiments involving ordinary equations are 
given by Heineken et al. [II] and Seinfeld and Lapidus [12, p. 432]. These 
references relate to optimal design for chemical rate constants. An 
ordinary differential in connection with the optimal design for heat trans- 
fer coefficients is studied by Van Fossen [13]. 

Further cases involving optimal estimation of parameters in the heat 
conduction equation or associated boundary conditions are given in refer- 
ences 14-20. Most of these cases involve consideration of linear partial 
differential equations. The dependent variable is usually a nonlinear func- 
tion of the parameters even though the differential equation model is 
linear; nonhnear differential equations introduce further complications in 
the design of experiments. Two papers studying nonlinear differential 
equations models are [21], which considers the case of temperature variable 
k, and [22], which contains a study of optimal experiments for freezing- 
melting problems. One difficulty is that the sensitivities must be obtained 
numerically (see Section 7.10); the integrals in the Cy'^’s must then be 
evaluated using trapezoidal or Simpson’s rule. This is not a bothersome 
difficulty. One more complexity is including the constraint of maximal 
range of tj when rj is obtained from a nonlinear equation. In that case rj 
in (8.3.ba) is not a simple function of r„. 



458 


CHAPTER » DESIGN OF OPTIMAL EXPERIMENTS 


854 Optimal Heat Conduction Expenment 

As noted above there are many possible optimal experiments diffenng m 
geometry, number of sensors, boundary conditions, and so on We natur- 
ally wish to design “best” expenments but practical aspects frequently 
mean that the optimum of all the optimal experiments cannot be chosen 
Section 7 94 describes an cxpcnment that is optimal in many respects for 
estimating k and c, this section is devoted to a descnption of the design of 
that experiment 

From a comparison of optimal results in Tables 8 2 and 8 3 the finite 
plate heated on one side and insulated on the other is found to be better 
than the semi-itifinite geometry U is also expenmentally practical 

The locations for two or more thermocouples are at the heated (x“0) 
and insulated (x « L) surfaces An equal number should be placed at each 
surface Because eight were available, four were at x » 0 and four at x *» L 
In order to ensure no direct heat losses front the heater, the beater was 
placed between two identical specimens both of which had two sensors at 
x*0 and two at L This placement of multiple sensors at the same location 
IS contrary to miuition — one feels that a better design would be to place 
each sensor at a different position relative to the heated surface If the heat 
conduction model used is correct, then the optimal locations are at x~0 
and Placing them in this manner one maximizes A* which mini 

mizes the vanances of k and c Furthermore the assumptions of constant 
variance and independent errors can be checked more readily than if 
measurements are not replicated 

The insulation boundary condition at x^L can only be approximated 
since there are no perfect thermal insulators The validity of this assump 
tion can be investigated by noting if there is a charactersitic ‘ signature’ in 
the residuals 

With an electric heater a step increase in heat flux (i e constant flux) of 
finite duration is easily introduced The heat flax to cause a step change m 
temperature at x = 0 (which is the optimal experiment in Table 8 3) is not 
as readily applied For that reason a constant heat flux for a finite duration 
was used Figure 8 1 7 shows the A* cntenon for this geometry for an equal 
number of sensors at x = 0 and L The heat flux is constant between limes 
0 and L The constraints of a fixed large number of measurements and 
same maximum temperature nse are used ft is found that a shorter 
duration of heating than the interval over which data are used results in 
increased values of A* This means that there are two optimal times in this 
experiment the duration of heatmg (/*=05) and the maximum time at 
which data are used (f* ^075) The expenment was designed to be near 
these conditions 



8.6 NONSTANDARD ASSUMPTIONS 


459 



Figure 8.17 The criterion for a finite plate insulated at x = L and heated at x = 0 with a 
constant heat flux during times 0</</^ after which the flux is zero. There are an equal 
number of temperature sensors at x = 0 as at x = L. 


After the experiment is performed and parameters estimated, one should 
check the validity of the assumptions. Residuals for an actual experiment 
are shown in Figs. 7.20 and 7.21. Most of the residuals tend to decrease 
with time for the last third of the experiment. This suggests heat losses at 
x=L and thus an imperfect model. Moreover, the residuals are highly 
correlated rather than being uncorrelated. In careful work both conditions 
would be further considered. It is anticipated, however, that the experi- 
ment design would not be greatly altered as the result of such investiga- 
tion. See the next section for a brief discussion of the treatment of 
correlated errors. 


8.6 NONSTANDARD ASSUMPTIONS 

In this section the basic criterion is modified for cases when two standard 
assumptions are no longer valid. The cases of nonconstant variance 
measurement errors and correlated errors are considered. 

8.6.1 Nonconstant Variance 

For all the standard assumptions being valid except that the error variance 

~ ^FFor covariance matrix is given by 
iag[(j] ...a„]. For maximum likelihood estimation the criterion to 




460 


CHAPTER S DESIGN OF OPTIMAL EXPERIMENTS 


maximize is (8 3 2) All the equations given above which include various 
constraints still may be used for nonconstant variance by simply replacing 
Xy by Xyor^ 

862 Correlated Errors 

A particular type of correlated errors is the first order autoregressive error 
which IS described by 

*, = PA-i + «,. '=12. ./I (8 6 1) 

where the u, are normal and independent with zero mean and variance 
When maximum likelihood estimation is used, this case can also build on 
the previous results by replacing by 

Zy^Xy-p,X,.,j> ««1. .n. ,p (862) 

where X(y is defined to be zero for all permissible j values For many 
equally spaced measurements in time, can be approximated by 

dX, 

= (863) 

which indicates that as p, approaches unity (perfect correlation) the time 
derivatives of the sensitivity coefficients become important 


8 7 SEQUENTIAL OPTIMIZATION 

Suppose that a set of expenments have been performed and the associated 
paraincters and parameter covanancc malnx have been estimated These 
expenments need not have been optimally designed but the next expen 
ment (or set of measurements) is to be optimally designed Suppose also 
that (subjective) MAP estimation is being used and that the standard 
assumptions denoted II -1113 are valid The cnterion to maximize in this 
case IS (see Appendix 8A) 

a-|xV 'x+vj'l (87 1) 

where XV’X is for the proposed experiment and is the covanance 
matrix of the estimated parameter values based on data of the previous 
expenments and pnor informauon The dimensions of 'X and 
must be the same, that is involve the same number of parameters 



g.8 NOT ALL PARAMETERS OF INTEREST 

To illustrate the criterion given by (8.7.1) assume that one previous 
experiment has been performed and that negligible prior information is 
available so that V^' is 

= (8.7.2) 

Then the second experiment would be designed so that 

A=1[X^4'-'X], + [X^>I'"X]21 (8.7.3) 

is maximized by the varying, the experiment duration, etc. Only the terms 
in would be changed. In some cases the second experiment 

might be similar to the first one while in other cases it would be quite 
different. 

The criterion given by (8.7.1) can also be expressed in a different form. 
By multiplying (8.7.1) by |V^| we find 

A|V^j = + 'XV^l = 11 + ^- 'XV^Xn (8.7.4) 


Now using the identity lI + ABl = jI + BA|, (8.7.4) can be written as 


l^+xv^xn 


(8.7.5) 


Since lV^jl;|/| is a positive constant, maximizing A is equivalent to maximiz- 
ing 

r=l.;/-t-XV^X^l (8.7.6) 

Hence we have a choice between maximizing A or T. Our choice should 
depend on the relative dimensions of the two matrices, which are p Xp and 
nXn, respectively. The determinant of lower dimension would be chosen. 
A case favorable to using T is for p<2 and for /? = 1, that is, a single 
measurement of the dependent variable is made. 


8.8 NOT ALL PARAMETERS OF INTEREST 

There are parameter estimation problems that require the estimation of 
parameters in addition to those of primary interest. The extra parameters 
are sometimes termed nuisance parameters. In Example 8.2.4 the parame- 
ter ^ (reciprocal time constant of the billet) might be the one of interest; 
however, it might also be necessary to estimate simultaneously the fluid 



462 


CHAPTER S DESIGN OF OPTIMAL EXPERIMENTS 


temperature Another type of problem is when statistical parameters 
such as the correlation p in the autoregression error model (8 6 1) are 
found Though the p value may be needed to estimate the confidence 
region generally its value is not needed as accurately as those of the 
“physical" parameters Further examples are given by Hunter and Hill 
[23,24] 

Appendix SB gives a derivation of a criterion when out of a total of p 
estimated parameters only the first q ip>q) are of interest For the 
standard assumptions designated lllll-ll (he criterion is to maximize 

IXfX. -xrx,(xjx,)- 'XjXfl - (8 8 1) 

1-^2 

where X, is an /i X 9 matrix and is for the first q parameters and where Xj 
IS an rtX/' matrix which is for ihe remaining r>*p~q parameters that are 
not of primary interest The symbol A, means the usual determinant of all 
the parameters, i e , 

A,-|X’‘X| where X-[X, X^] 

The minimum q and r values are q^l and I In summation form this 
simple case results m 

A,,- A,, )‘(ZX‘ )■ ' (8 8 2) 


Let the condition of a fixed large number of measurements equally spaced 
in lime be valid, by using ihe notation given by (8 3 5) can be 
approximated by 

A, J = « A7j S n[ C, , - ' J = n A5C,2 ' (8 8 3) 

A comparison of this expression with (8 5 17) shows that A"j is propor- 
tional to the reciprocal of the variance of b| Hence maximizing has the 
beneficial effect of minimizing the variance of bf 
As an example of the use of the maxA72 cntenon consider the exponen- 
tial model 

*■' /»-(!,/ (884) 

which has one linear and one nonlinear parameter For /}, being the 



463 


8,8 not all parameters OF INTEREST 


parameter of interest, the criterion is to maximize as 
( 8 . 8 . 3 ); here 


implied by 


r 1 r'"e-2h.V/=i[l-exp(-2C)] 


(8.8.5) 




( 8 . 8 . 6 ) 


For C„, see Problem 8.1. From these expressions we * Ybetag the 
plotted as a fnnction of C = as depicted m F>S- 8 - S- For 
primary parameter of interest, the optimal value of r, J 

If instead ft is the parameter of 4 ; is proportional 

the C’s in (8.8.3) are m«rchanged Since the J J], 

to (j 3 ,// 32 )^ plotted also in Fig. 8.18 is ^2 ^ 12 /Pi . ” ^^ninared 

time for this case is C ‘2.0. This dimensionless time “n “mpamd 

with the optimal limes tor estimating /3, alone of ^ experi- 

a„d both ft and ft of 1.191. Hence the optimal durations °t “P“ 
ment can be quite different for the various objectives of /3, on y g 
interest, and so on. 



Figure 8.18 Cri-riz Ik cr.-i:r.£l es-joation of and in the model i7-^i«P( ^ 2 ') 
where each Tzzy be cf —mi-.- mtererc 



4^^4 


CHAPTER « DESIGN Of OPTIMAL EXPERIMENTS 


8 9 DESIGN CRITERIA FOR MODEL DISCRIMINATION 

Sometimes the physical model is not known but several alternate models 
can be proposed In such cases the problem is to select the “best model 
that IS, the one that best fits the data A method of model selection, termed 
model discrimination, involves experimental designs that maximize dif 
feiences between predicted responses of two or more models 

A chemical engineering example of a case where discrimination is 
needed occurs when substance ^ reacts in (he presence of a catalyst to 
form substance B, which in turn forms C Two possible models are 
A-*B-*C and A-tBssC The predicted concentrations o'f substance B 
versus time for the two models are shown in Fig 8 19 If the reaction is 
observed only until time t, no discrimination can be accomplished be- 
cause the predicted responses are nearly identical until t, Measured values 
of the B concentration are required after time r, (and preferable near i^) to 
determine the best model 

Many methods of model discrimination have been proposed Given first 
IS a method that results in a criterion similar to d Next discussed is a 
method utilizing information theory The former method is simpler in 
application but the latter has a more satisfying basis In each case the 
analyses start with consideration of two competing mathematical models 





Figure 8 19 Discnmination example mvolvuig concentration of substance B for models 
A-*B-tC and A-tBoC 



8,9 DESIGN CRITERU FOR MODEL DISCRIMINATION 


465 


8.9.1 Linearization Method 

In this method the objective is to seek experiments that cause the minimum 
values of the sum of squares functions to be quite different for two 
competing models. Suppose two models are available and the best one is to 
be determined. Let the standard assumptions 1111--11 be valid and OLS 
estimation be used. (The analysis can be modified for other cases). The 
sum of squares function for model / can be written as 

5 ('■> = (Y - 1} ^ ( Y - 7] <'■>) (8.9.1) 

Let the model equation be = where x is the independent 

variable vector, is the vector of parameters common to both models (if 
there are common ones), and /3 is the q vector of parameters distinctive 
to model /. Suppose that a nominal set of parameters is chosen and that 
71^'^ is expressed in terms of a Taylor series near this nominal set so that 

,j<‘)si 7 W) + xO)a/3''^ (8.9.2) 

where is the sensitivity matrix for /5^'h Introducing (8.9.2) into (8.9.1) 
where the values are chosen to minimize yields 

min = (Y - T]^°>) ^ (Y - + 2(X<'> A/3<'>)^ - Y) 

+ (X^'>A/3''>)^X''>Aj8^'^ (8.9.3) 

which implies the A/3‘'^ vector of 

A/3^'^ = (X^^''>X<''))" ‘X^^'^Y - 17 W)) (8.9.4) 

Let us now subtract min from min 5^'^ and attempt to find the 
maximum of the absolute value of the difference or 

C = max|min — min 5^^^! 

= maxl(Y - t7®)^[x«\X^^ 2>X^^>)” - X('>(X^('>X('>)“ ](Y - i7<o>)| 

(8.9.5) 

Although we do not know Y — 77^°^ let us assume temporarily that Model 
1 is correct and that the measurement errors are sufficiently small so that 

Y-i7<«)sX<'>A|3^'^ 


(8.9.6) 



466 


CHAPTER t DESIGN Of OPTIMAL EXPERIMENTS 


Then C given by (8 9 5) becomes 

C=miiiiAp”'’M“»A^<»j (8 9 7a) 

(8 9 7b) 

The qXq matrix M”' is exactly the same matrix whose determinant is 
maximized when is for q parameters of primary interest and is for 
r parameters of less mtcrest see (8 S l> 

If instead of Model I being correct Model 2 which involves r parame 
ters IS correct (8 9 7) becomes 

C = (898a) 

M“>-X™>X“-X™'X'"(X"'’X‘'>) ’X"'>X“> (8 98b) 

The problem now is to select some criterion that has the effect of 
maximizing C If C is fixed at some value (8 9 7a) and (8 9 8a) both 
desenbe the surfaces of hyperelhpsoids since both are very similar to the 
cotifidetice region expression given by (6 8 39) The coordinates are the 
AyS’s In the case of (8 9 7a) for a given hypervolume C is maximized by 
maximizing the determinant of IVf" For Model 2 being correct the 
analogous encenon is the maximization of But since we do not know 
which model is correct we choose a cnienon that does not prefer one 
model over the other Such a cnienon is simply formed from the aug 
mented X^X matnx That is we propose that discnmmation can be 
improved by designing expenments so that 

» IX™ V”11M'”1 = lX*^'‘X''’l|M”'i (8 9 9) 

IS maximized Note now that the X matrix is composed of sensitivity 
malnces from two different models and that has dimensions « x ^ and 

has dimensions nXr An advantage of the A entenon given by (8 9 9) 
IS that It IS simple its use is similar to the A criterion discussed in Sections 
8 1-8 7 A lurther advantage is that no data are needed for the design of 
m/wAs. 'hia. <soa. 'ha. and. siwja 

approximate values of the parameters 
The effect of maximizing A given by (8 9 9) is to emphasize the dif 
ferences between the models All models fail at some point and it may be 
that at these points the greatest discnmination power is present For 







85 DESIGN CRITERIA FOR MODEL DISCRIMINATION 


467 


example, there are certain heat conduction problems in which changes 
occurring during heating of a material may be due to a change of phase or 
a chemical reaction. One of these is reversible and the other is not. This 
suggests that the critical temperature range be covered using a cooling 
after a heating process. The behavior of the change of phase and reaction 
models are quite different during the cooling period. 

Another example where discrimination might be used in determining if 
/! = ^j + ^ 2 ^ ^ = + )S 2 (T’— r^) is the better model of Section 7.5.2. 


Example 8.9.1 

Consider the two competing models 


-^1 = P\ + PiO-e ’), T)2=tS, + /S2sinf 

The standard error assumptions are valid. The optimal duration of experiments for 
a large fixed number of uniformly spaced measurements starting at r = 0 is to be 
found. No constraints on the ranges of the ij’s are to be used. 


Solution 

Since the constant parameter /?, appears in both models, both models are alike to 
that extent. Hence the emphasis should be upon the ^2 terms. Using the above 
notation we have 


X’'t‘’ = [l -e~'' ... I- e '"], X^‘^' = [sinr, ... sin/„] 


The quantity to maximize is A given by (8.9.9). To include the assumption of a 
fixed large number of uniformly spaced measurements, A should be modified to A" 
as indicated by (8.3.5). In this case C,| would be C,, of Fig. 8.1 1 and C 22 would be 
C 22 of Fig. 8.9. The resulting A" is nearly zero until time t„ = 2.5 at which time A" 
rises quickly to the first local maximum of about 0.4 at time 5.5. After this time the 
A" criterion gradually oscillates to larger values with the global maximum being 0.5 
at t„-^oo. These results are reasonable because sint and 1 — e~' are similar until 
t= 1.5 but are quite dissimilar for r > 3. 


According to the max A" criterion which assumes many equally spaced measure- 
ments starting at r = 0, then, the experiment should be of infinite duration but for 
practical purposes it could be any time greater than /„ = 5 to discriminate between 
the two models. 


8.9.2 Information Theory Method 

Suppose that two rival models are available, T]^'^ = f^'^(x,/3^'^) where /=1 
and 2. Assume that estimates b<'> for the parameters appearing in the /th 



468 


CHAPTHl $ DESIGN OF OPTIMAL EXPERIMENTS 


model are available and that ihe associated estimated covanance malnx 
IS known Typically these are obtained by fitting each model m turn to 
data from previously performed eapenments Using the parameter values 
b‘‘^ the values of the dependent variable can be predicted for any 
proposed experiment, assuming the jth model is correct This prediction is 
designated 

Y(>*|<>)(x.b<->) (8 9 10) 

The covariance matrix of the prediction error in (8 9 10), assuming that 
model I IS correct, can be shown to be approximately 


(8 911 ) 


where ^ is the covariance of the measurement errors of Y and X*'’ is the 
sensitivity matrix lor the ith model and the expenmenl being considered 
The second term on the nght side of (89 11) is similar to that given by 
(6 2 12a) or (6 5 6) 

The hypothesis that the ith model is correct leads to regarding the 
outcome of a proposed experiment x as a random variable >i with probabil 
tty density function p^'*(>ilx) having mean and covariance given by (8 9 10) 
and (8 911), respectively If Model 1 is correct then is distributed as 
p^'^(»j|x), if Model 2 IS correct ij is distributed as p'*^(»ilx) Kullback (281 
has suggested that the quantity ln(;i*'*(ii|x)/p*’’(i)|x)] is a measure of the 
favorabiliiy of hypothesis 1 over hypothesis 2 The expected information m 
favor of Model (or hypothesis) I is 


f 






dri 


But since it is not known whether Model 1 or 2 is correct, Kullback 
suggested that the measure of total information /, j be maximized where 


Ai(x) = /°° L*'*(illx)In +y»fq|x)ln I ^ (8912) 

The objective is to select an expeninent x that maximizes y, jfx) A large 
value of J| 2 can be obtained only if is much larger than or vice 
versa In either case the result is a strong preference for one model over ihe 
other The quantity y, 2 is called by Kullback the information for dts- 
crimination and is similar to (8A4) 



85 DESIGN CRITERIA FOR MODEL DISCRIMINATION 


469 


Let the measurement errors be normal (more specifically, 11--1011) and 
let the model errors have covariance matrices and Then it can be 
shown that 

y, 2 (x) = - m + itr[U^'>V(''> + U®V(‘^ ] 

+ |(Ya)_ y(i))^(U(') + U®)(Y^2^- Y<‘>) (8.9.13) 

where U^'^=(V^'^)''. An important special case occurs when one dependent 
variable is present in the model and only one measurement of it is made. 
Then for /=1,2, = and 13^9 = 0,-^ where 

of = sf+ i 2 (8.9.14) 

/t = i /=i 

and sf is an estimate of F(T). Then (8.9.13) becomes 

(8-5-I5) 

The information regarding the experiment x is contained in a^, al, 

and y The objective is now to choose the measurement so that yi,2(A) is 

maximized. 

Box and Hill [29] were the first to derive (8.9.15); they have been 
pioneers in the application of sequential design of experiments for model 
discrimination. 

Let us briefly consider some implications of (8.9.13)-(8.9.15). Hypotheti- 
cal plots of the predicted values ^^d 7*^^ are shown in Fig. 8.20 as a 
function of time t (which is x in this case). If a single time is to be chosen 
to decide between models 1 and 2, time where the responses coincide, 
would not be helpful; time /j, where (T*^^— yO)^2 js ^ maximum, would be 
better. The single best measurement time according to (8.9.15) is when 
(ya)_ is ^ maximum provided a\ and a| vary only slightly with t. 
The decision of which model to choose depends upon how yO) ^nd 
compare with the measured value Y at the same time. If 7 9) jg nearer to 7 
than 7^2\ then model / would be selected (for i,j =1,2 and / Ay)- 

Should 7 be midway between 7^9 y(2)^ jg |,asis for model 

discrimination. It is interesting to compare the criterion for this case with 
the one previously given (A). For this latter criterion, — S would be 
zero and thus the observation would be sought at some other t where 



470 


CHAPTER * DEsrCN OF OPTIMAL EXPERIMENTS 



Figure 8 20 Discnn»n« on between two ptcdtcteO V » using the intormauon theory method 


lS‘''-S«^i IS a maximum Hcncc Ihe two criteria may not yield the same 
optimal experiments 

There are several ways to treat more than two models One is the 
following After each experiment s performed the likelihood 
associated with each model and its current parameters is computed We 
then design the next experiment in such a manner as to discriminate 
between the two models having the largest likelihood values Another 
method for discnmination between more than two models is given by Box 
and Hill [29] 

2 1 Termination Criteria 

A genera] sequential procedure of mechanistic model building can be 
visualized as including the steps in Fig 8 21 (By mechanistic we mean a 
model that can be derived from basic principles ) Note that on the left are 
tasks performed by the analyst in the center by (he computer and on the 
right by the laboratory After starting one can propose some competing 
models G E P Box has made the point that one should not be timid in 
proposing models The process jtself should lead to discarding unsuitable 
models Next comes performing expenments followed by esUmaimg all 
the parameters for alt the models (block 3) In block 4 optimal expen 
ments are sought to discriminate between the competing models The 
method of Box and Hill could be used for this purpose If desired the 
experiment in block 2 could have been designed using the method in 
Section 8 9 1 which does not directly utilize expenmental data 

After the optimal experimental conditions (designated x^) in block 4 are 
found the new experiment is performed (block 5) after which the esti 
mates for all the parameters aie found b}” bf* Then m block 7 a test 





■o 

c 

LU 


471 


Figure 8.21 A sequential procedure for mechanistic model building including design of experiments to discriminate between competing models. 









472 


CH-i^PTOl* DESIGN OF OPTIVL^L EXPERIMENTS 


IS made to ascertain if any of the proposed models is satisfactory At the 
same time certain of them may be discarded The rest of this section is a 
discussion of a termination cntenon and suggestions for determining if 
another model is needed 

The wide applicability of the maximum likelihood method of estimation 
of parameters and of generalized likelihood ratio tests suggest the consider 
ation of likelihood ratios in seleciing (he better of two models 
Suppose that the objective is to choose one of two hypotheses 
(Model I IS correct) or (Model 2 is correct) Let Z,^’(Y,f>^’) be the 
maximum joint probability density function associated with the data 
obtained thus far for the ilh model and the associated parameters 
A likelihood ratio test can be constructed as follows 

1 U L'-'V < A accept hypothesis 2 

2 \\ / L''^^ > B accept hypothesis 1 

3 If fi investigate alternate models and perform more 

experiments 


Methods for choosing A and B are discussed from differing points of view 
by Ghosh {30) by Fedorov |3) and by Bard pi) Bard suggests that the 
relations b^ween A and B and the probabilities of error which Wald [32] 
gave for testing simple versus simple hypotheses where sample sues ate 
large will work approximately m this situation That is if we let a, be the 
probability that is accepted when is true and the probability that 
//j IS accepted when is true then for independent observations 


A 


(8 9 16a) 


_ i-A . MB-l) 

B~A 


(8 9 1 6b) 


These relations mean for example that if we wish to be 90% certain that 
we accept only if /f, is true and 80% certain that we accept H 2 only if 
ifj is true then a, =0 1 and 02=02 Then using (8 9 16a) /4s02/09= 
0222 and BsO8/0I = 8 If we had started with A and B then the 
corresponding probabilities would be found using (8 9 I6b) 

In addition to continuing expenmentation when the likelihood ratio is 

can be gained for improving any of the models or for proposing another 
model This would then lead to blocks 8 and 9 in Fig 8 21 Regions of 
large departure m the residuals from random conditions can sometimes 
imply improvements in the models Also insight into statistical assump 



8.9 DESIGN CRITERU FOR MODEL DISCRIMINATION 


473 


lions can be gained through inspection of the residuals. If, for example, the 
residuals are highly correlated for a proposed model, then either the model 
should be improved or the errors must be considered as being correlated. If 
repeated experiments continue to show high correlation in the residuals, 
one should examine them to see if there is some characteristic “signature” 
in the residuals. If there is, one should attempt to improve the model to 
remove these signatures; if there is no signature one would model the 
errors as being autoregressive, moving average, etc., processes. 

Example 8.9.2 

Two models have been proposed for a process in which m different thermocouples 
have been used to make n measurements each. The assumptions of additive, zero 
mean, constant variance, independent, normal errors are made. The variance is 
unknown, there are no errors in the independent variables, and there is no prior 
information. (These assumptions are designated 11111011.) Find the likelihood 
ratio. 


Solution 


The parameters for the ith model are found by maximizing the natural logarithm of 
the joint probability density function (pdf) of independent normal errors with 
respect to <'•. The maximum value of the pdf is 


£<■> = (277) '""^^a-'”'’exp^- 

m n y 

^=1 A. = I 

provided L‘'> is also maximized with respect to which leads to 



Then becomes 


Z.O) = (2^)-"''/^(a<0)-'"''exp j 

and thus the likelihood ratio is 




^ ratio, we can determine to a given confidence whether Model 1 
or Model 2 is to be accepted using the procedure described above. Before accepting 

reSInabTe ’ investigate if the postulated assumptions are actually 



474 


CHAPTER 8 DESIGN OF OPnMAL''EXPERIMENTS 


REFERENCES 

1 Box, G E. P and Lucas H L , “Deuga of Expetimentt in Nonlinear Situauons,” 
Biomtmka A6 (1959) 77-90 

2 Box G E P and Huntet, W G, * Non serjucDUal Designs for the Estimation ol 
Parameters in Nonlinear Models," Tech Rep No M, University of Wisconsin, Dept of 
Statistics Madison Wn 1964 

3 Fedorov, V V,, Ttonya Opiimalnogo Bt^nmema, ladatel'stvo Moskovskogo Uni 
versiteta, 1969. translated by W J Studden and E M Klimko. Theory of Opiimat 
Experiments, Academic Press, Inc , New York, 1972 

4 Badavas, P C and Saridis, G W, * Response Ideoiificaiion of Distributed Systems with 
Noisy Measurements at Finite Points," /»/ Set 2 (1970), 19 34 

5 McCormack D J and Perils H J, “The Deierminaiion of Optimum Measurement 
Locations in Distributed Parameter Processes." Proceedings of the 3rd Annual Prince 
ton Conference on Infontiation Sciences and Systems. 1969, pp 51(VSIS 

6 Nahi N E , EHimation Theory and Appbeatum John Wvley and Sons, tnc , New York. 
1964 

7 Smith K , "On the Standard Deviattons of Adjusted and Interpolated Values of an 
Observed Polynomial Funciiott and its Consume and (be Outdance they give Towards a 
Ptoptt Choice of the DisinVution of Observations Biomeinko 12 (J918) 1-85 

8 Atkinson A C and Hunter W C "The Design of Espenments for Parameter 
Esiimaiion," Tiehnometnet JO (1968) 271-289 

9 Carslaw.H S and Jaeger J C.Condaeuonof HeatmSohds 2nd ed . Oxford University 
Press London, 1959 

10 Beck, J V “The Opiimum Analytical Design of Transient Experiments for Simulta 
neous Determinations of Thermal Conductivity and Specific Heat, Ph D Thesis Dept 
of Mechanical Engineering Michigan State University 1964 

1 1 Hemeken, P G , Tsuchiya H M and Ana R ‘On the Accuracy of Determining Rate 
Constants iti Eniymatic Reactions " Math Siosci t (1967) U5-141 

12 Seinfeld J H and Lapidus L, Maihemaiieal Methods in Chemical Engineering Vol J 
Process Modeling EsIimalion, and Idenlificaliem, Prentice Hall Inc Englewood Qitfs 
NJ, 1974 

13 Van Fossen C J Jr "Design ol Expeninents for Measuring Heal Transfer 
Coefficients with a Lumped Parameter Calorwneter" A/tS/l TN D 7357 1975 

14 Beck J V , "Analytical Determination of Optimum Transient Experiments for Measure- 
ment of Thermal Properties ” Prof 3rd lot Heat Transfer Conf 44 (1966) 74-80 

15 Beck, J V , ‘Transient Sensitivity Coeffiaents for the Thermal Contact Conductance ” 
Im J Heat Mass Transfer \<y (\96T) 1615-1617 

16 Beck J V “Determination of Optimum Treatment Expenments for Thermal Contact 
Conductance,’ Im J Heca Mass Transfer 12(1969) 621-633 

17 Bonacina C and Comini G . ‘Calculattonof Convective Heat Transfer Coefficients for 
Time^itmpwsnn^ turns'* 9n» Vbsv fajrsg ftroteKstu* VATV; W-'ATi 

18 Comini G. “Design of Transient Expenments for Measurements of Convective Heat 
Transfer Coelficients." Int inst Refng Freudensiadl (1972) 169-17% 

19 Cannon J R and Klein R E ‘Optimal Selection of Measurement Locations m a 
Conductor for Approximate Determination trfTemperature Dislnbulions"J Dyn Sys 
Meas Control 93(1971), 193-199 



APPENDIX 8A CRITERIA FOR ALL PARAMETERS OF INTEREST 


475 


20. Seinfeld, J. H., “Optimal Location of Pollutant Monitoring Stations in an Airshed,” 
Almos. Environ. 6 (1972), 847-858. 

21. Beck, J. V., “Analytical Determination of High Temperature Thermal Properties of 
Solids Using Plasma Arcs,” Thermal Conductivity, Proceedings of the Eighth Conference, 
1969. 

22. Van Fossen, G. J., Jr., “Model Building Incorporating Discrimination Between Rival 
Mathematical Models in Heat Transfer,” Ph.D. Thesis, Dept, of Mechanical Engineer- 
ing, Michigan State University, 1973. 

23. Hunter, W. G. and Hill, W. J., “Design of Experiments for Subsets of Parameters,” 
Tech. Rep. No. 330, University of Wisconsin, Dept, of Statistics, Madison, Wis., March 
1973. 

24. Hunter, W. G., Hill, W. J., and Henson, T. L., “Designing Experiments for Precise 
Estimation of All or Some of the Constants in a Mechanistic Model,” Can. J. Chem. 
Eng. 47 (1969), 76-80. 

25. Graybill, F. A., Introduction to Matrices with Applications in Statistics, Wadsworth 
Publishing Company, Inc., Belmont, Calif., 1969. 

26. Meyers, G. E., Analytical Methods in Conduction Heat Transfer, McGraw-Hill Book 
Company, New York, 1971. 

27. Parker, W. J., Jenkins, R. J., Butler, C. P., and Abbott, G. L., “Flash Method of 
Determining Thermal Diffusivity, Heat Capacity, and Thermal Conductivity,” J. Appl. 
Phys. 32 (1961), p. 1679. 

28. Kullback, S., Information Theory and Statistics, John Wiley and Sons, Inc., New York, 
1959. 

29. Box, G. E. P. and Hill, W. J., “Discrimination among Mechanistic Models,” Technomet- 
rics 9 (1967), 57-71. 

30. Ghosh, B. K. Sequential Tests of Statistical Hypotheses. Addison-Wesley, Reading, 
Mass., 1970. 

31. Bard, Y., Nonlinear Parameter Estimation, Academic Press, Inc., New York, 1974. 

32. Wald, A., Sequential Analysis, John Wiley and Sons, Inc., New York, 1947. 

APPENDIX 8A OPTIMAL EXPERIMENT CRITERIA FOR ALL 
PARAMETERS OF INTEREST 

For the standard assumptions of additive, zero mean, normal measurement 
errors in the dependent variable, the joint probability density of the 
estimated parameter vector b is 

p(b) = (2:7)"^/^|P|-’/2expJ_i(b-/3)^p-‘(b-^)] (8^.1) 

where P is the covariance matrix of b. This expression also assumes 
errorless independent variables. We also assume that the error covariance 
matrix V IS known to within a multiplicative constant o^. These assump- 
tions are dedgnated 11— 101-, (8A.1) is exact if the dependent variable is 
inear m the parameters: if rj is nonlinear in the parameters, then the 
expression is approximate. 



476 


CHAPTCR 8 DESIGN OF OPTIMAL EXPERIMENTS 


For the assumptions given above the confidence region can be found 
from an expression similar lo {see (6 8 38)] 

(b-p)^P '(b-p)=constant=C^ {8A2} 

For a given value of this equation describes a hyperellipsoid which has 
a hypervolume given by 

volume. + ')] 

wherep IS the number of parameters r( ) is the gamma function a^d^ is 
the ith eigenvalue of P Now the determinant of P is equal to the product 
of Us eigenvalues Thus to mmimite the hypervolume of a confidence 
region the determinant of P should be minimized This is equivalent to 
maximizing the determinant of the inverse of P For the standard assump 
tionsof nil! ll this leads to the entenon of maximiLing 4*|X’’X| which 
has been given by Box and Lucas ft) The criterion of max [P ‘) is more 
general however 

Exactly the same criterion can be derived using the Shannon {28] 
concept of a measure of uncertainty which is related to information theory 
He showed that the unique (except for a positive multiplicative factor) 
suitable measure of uncertainty associated wuh the probability density 
function of the random parameter vector b which is denoted ^(b) is given 
by 

/f(p)e-£(Inp)*» - fp(b)irp(b)(ib (8A4) 

Information is gamed when unceriamiy is reduced Suppose pg{b) is the 
prior density of b that is resulting from previous experiments Let/>|(b)be 
the posterior density after another experiment has been performed The 
amount of information / gamed by the experiment is [28] 

/=/f(Po) H{p,} (8A5) 

Our goal IS to select an expcnmeni that maximizes / Since //(^o) is 
unaffected by the new expenment we simply minimize ff(p ) 

Let us evaluate W(p) for the standard assumptions 1 1 1013 Then p(b) is 
given by (8A 1) and thus 

H(p(b)).-£[lii;7(b)]---£(;[./.ln2v-lnlPl (b /!)'P '(b-/i)]j 
-;{fln2i7 + lii|P|+li[P '£(t>-p/(b-P)]} 

= ;(bto2v+ln(Pl+tt[P 'P])-;{p(l+Iii2i) + ln|P|) (8A6) 



APPENDIX 8B CRITERU FOR NOT ALL PARAMETERS OF INTEREST 


477 


where p on the right side designates the number of parameters. Discarding 
irrelevant constants, a measure of uncertainty is 

//*[p(b)]=ln|P| (8A.7) 

But minimizing this function is equivalent to maximizing |P“*| which was 
given above using the minimum confidence volume approach. 


APPENDIX 8B OPTIMAL EXPERIMENT CRITERIA FOR NOT 
ALL PARAMETERS OF INTEREST 

Suppose that of the total number p of the estimated parameters only a 
subset of them need be estimated accurately. Let the estimated parameter 
vector b be partitioned into two vectors b, and bj so that 

b^=[b?'bj] (8B.1) 

where b isp X 1, bj is ^ X 1, and bj is an r vector where r=p — q. The vector 
b, consists of those b'% of primary interest and b 2 contains the others. Let 
the same statistical assumptions denoted by 11-101- and discussed in the 
beginning of Appendix 8A be valid. Let the covariance matrix of all the 
estimated parameters be designated P and be partitioned as 



where P,, is ^X ^ and is for the bj vector, etc. 

For this case the joint probability density of b is given by (8A.1). If the 
experimenter desires precise estimates of only bj. Hunter and Hill [23,24] 
state that the marginal distribution of b, is then needed. It is obtained by 
integrating (8A.1) with respect to b 2 . From Theorem 10.6.1 of Graybill [25] 
the marginal probability density of b, is 

p(b,) = (2^)-^/"|P„|-‘/^exp[-i(b,-p.)^Pr,'(b,-/3,)] (8B.3) 

Following the same reasoning as in Appendix 8A, the criterion is to 
maximize 

V = |Pu’l (8B.5) 

The terms in should be related to the sensitivity matrix. Let X be 
partitioned as 


X=[X,X2] 


(8B.5) 



478 


CHAPTER 9 DESIGN OF OPTIMAL EXPERIMENTS 


where X, is and is nXr Then for maximum likelihood estimation 
P = (XV~’X)“’ where X^if "’X can be wntten as (see(6 I I7a)) 




xj^-% 

xr+-'x, 


x»-'x, 

x»-'x, 


(8B6) 


Taking the inverse of (8B 6) and identifying the upper left malnx as Pn 
results In the criterion being to maximize 


'ie.-IX,'* 'Xi-Xfif 'X,(xr^-'V,) 'Xfif-'X|| (8B7) 


If the errors are independent and have a constant variance (i c 1 1001-1 1) 
this expression reduces to the one given by Hunter and Hill {23 24] which 
ts 

^«.-|Xrx,-XfXj(XfXj) 'X{X,1 (8B8) 


Using (6 I 17), t&^ given by ($B 7) can be related to the usual ^ by 


IXjV'Xil 


(8B 9} 


where is the determinant of the expression given by (8B 6) 


PROBLEMS 


Unless otherwise stated assume that the standard conditions designated 1 1 1 1 1 II 
are valid for the following problems 
8 I For X^ie~'*^ show that A" given by (8 23) becomes 
A" = e*[r''-e '+2 j.)]/4 

Verify that at /,= I 691817 </A"/‘*d=0 At the same value of i, show that 
the sufficient condition for a maximum d^&''/di}<0 is also satisfied 
8 2 For ij“i8Csin/ show that A* given by (82 13) becomes (for >n-/2) 

Also show that A"'' has extrema when tanT=T is satisfied Use Myers [26 
p 442] to find the first three noiuero positive roots of tanT = r 
8J Derive (8 2 14) 



PROBLEMS 


479 


8.4 Consider the model = where /(/,) can assume only the values 

indicated below. The optimal conditions for estimating P are needed. 


/ m 

'• m \ 

i Jib) 

1 0 i 

5 2 1 

9 -2 

2 1 

6 1 

10 -3 

3 2 

7 0 1 

n -4 

4 2.5 

8 -1 

12 -3 


(a) What single i should be chosen if only one measurement could be taken? 
{b) What I value(s) should be selected if two observations are to be taken? 
Repeated observations at any i, are permitted. 

(c) Same as {b) except repeated observations are not permitted. 

(d) What three / values would be selected if repeated / values are not 
allowed? 

8.5 For the model ‘q, = Pifi(t,) + P2f20i) the below discrete values are permitted 


/ fM AOi) 

1 10 2 

2 8 4 

3 6 5 

4 3 0 

5 2-4 


{a) What are the two optimum locations to take measurements? 

(b) What are the best three locations to take observations? 

Repeated values are not permitted. 

(c) Same as (h) except repeated values are permitted. 

8.6 For the model ij = /S, exp( - verify that the optimal locations for n = 4 are 

at and 1. There are no constraints on rj or t. Study the region 

0<!<1.2 using the spacing of A(==0.l. Use a programmable calculator or a 
computer. 

8.7 Find the optimal two values of for estimating p^ and P 2 in the 

model i5 = ^,sin^2t- There are no constraints on ij or /. 

8.8 Find the optimal value of C = p 2 l„ for a large number of uniformly spaced 
measurements in 0<r</„ for the model ti = /3,sint ■". Use a computer if 
necessary. No constraints are to be used on rj or t. 

8.9 For the model of the cooling billet, r= T„ + {T„~ T^>exp(- jSi), find the 
optimal duration of the experiment for a large number of equally spaced 
measurements. The parameters are Tq, T„, and p. 

8.10 For the model T, = [^,/^,-^^][exp(-jS20-exp(-/3,/)] find expressions for 
the ft, and P 2 sensitivity coefficients. See (8.3.19). 

8.11 Find general expressions for the sensitivity coefficients plotted in Figs. 8.15 
mid 0.16. 



4S0 


CHAPTER 8 DESIGN OF OPTIMAL EXPERIMENTS 


8.12 A plate which is subjected toalargeimtantaneous pulse of energy Qa,tx=Q 
and IS tnsulated at je » £. has the solution for the temperature of 

where t* t/L^, c is the density-specific heat product, and Q has units 
of energy (Btu or J) per unit area For X’^Q the temperature is infinity at 
time zero and decays to T^+Q/cL for large time At ar = Z. the temperature 
starts at and increased to T^^Q/cL 

(fl) Find an expression for theo sensitivity at x/L= 1 
(A) Evaluate using a computer the expression found in (a) for 0<r*<3 
For a fixed value of Q (and no restriction on the range of T) show that 
the optimum time to take a single measurement is /’*' = 1 38 Also show 
that this time corresponds to the time that the temperature at x»* L has 
reached one half of the maximum temperature nse This "one half" time 
IS the basts of finding a in pulse or flash experiments See the paper by 
Parker, Jenkins, Buster, and Abbott {27] 

(c) Also using a computer find the optimum experiment duration for many 
equally spaced measurrmcnis at x/L* 1 

8 13 (ff) A large number of measurements uniformly spaced m time have been 
made at x^O and x«L in the heat conducting body discussed in 
Seciicm 8 52 2 For mo and mj sensors at x *0 and L, respectively show 
that given by (6 3 7) can ^ written as 

where z^m(,/m and and where 

C*^C*o+C*, 

The third subscript in or C*i refers to x »0 or L respectively The 
standard statistical assumptions are valid 
(A) Derive an expression for x at which A* is a maximum assuming that x 
can assume any value m the uterval 0 to I 

(c) The following values are for the heat conducting body discussed in 
Section 8 5 2 2 

C,to=007609 C,to=01062 CiJo-01552 

C, I, = 00148 CjI, = -00422 Cjti = 0126 

The values correspond to the dimensionless lime =0 65 The first two 
subscripts correspond to A (a 1 subscript) or c (a 2 subscript) Using the 
expression derived in part (A) find a value for x 

(d) What conclusions can you draw from the results of this problem’ 



APPENDIX JTS. — 

Identifiability condition 


A.l INTRODUCTION 

The problem of investigating the conditions under which parameters can be 
uniquely estimated is called the identifiability problem. A convenient means of 
anticipating slow convergence or even nonconvergence in estimating parameters 
can save unnecessary time and expense. Also if easy-to-apply identifiability condi- 
tions are known, many times insight can be provided to avoid the problem of 
nonidentifiability, through either the use of a different experiment or a smaller set 
of parameters that are identifiable. 

The purpose of this appendix is to derive the identifiability criterion that the 
sensitivity coefficients in the neighborhood of the minimum sum of squares function 
must be linearly independent over the range of the measurements. This criterion 
applies for linear and nonlinear estimation. This criterion is derived only for a 
weighted sum of squares function which includes least squares, weighted least 
squares, and ML estimation with normal errors, in each case with no constraints on 
the parameters. For MAP estimation with prior parameter information it might be 
possible to estimate the parameters even if the sensitivity coefficients are linearly 
dependent. 

This condition of independence of the sensitivity coefficients is particularly 
convenient if the number of the parameters is not large, say, less than six. Even if 
the number is larger, linear dependence between two or three of the parameters can 
sometimes be readily detected from graphs of the sensitivity coefficients. The 
plotting of the coefficients is extremely important and should be done for each new 
problem before attempting to estimate the parameter. 


iiQI 



APPENDIX A IDENTIFIABIUTY CONDITION 


Consider a general sum o( squares function for measuremenis given by 

S- i 2 (Al) 

u-l c-l 

where is an element of W, a square, symmetric, positive-definite matrix Let the 
function S possess continuous derivatives tn the neighborhood of its nunimuni in 
the parameter space which occurs when ij is evaluated at /J*, 

Pi p;] (A2) 

A Taylor series expansion ol S in the neighborhood of its minimum is 


+ 2 2 


_ a’s(d-> 


Using 5 defined by (A 1) in <A 3b) gives 

s;5 = '2i: i w.,(y,-nr)x; x,s 

sS,-2 2 2 ] 


The expression X^, in (A 4a) is called a scasiliviiy coefficient 

For a model linear in the parameters the cross derivative m (A 4b) is equal 
to zero as are also the third and higher order derivatives of S' Note that the 
condiuon of continuous derivatives of S with respect to p will be satisfied if ij and 
its denvatives are continuous functions of p A necessary condition for S to pos 
sess a minimum at P* is that 


51=0 fQri=l 2 ,p 



A.2 THEORY 


483 


Define the determinant as 






^Pip2 ■ ■ 

■ 


^^2 ■■ 

^PrPr 


Then S, approximated by the terms explicitly given by (A.3a), has a unique local 
minimum if in addition to (A.5) being true it is also true that 

Z),>0 for /-= 1,2,...,/) (A.7) 

which is the condition that be positive-definite; see reference 1 . A minimum can 
exist with weaker conditions, however. For example, if Z)^ = 0 a minimum may exist 
but it may not be unique; that is, the minimum could be along a line rather than at 
a point. The conditions given by (A.5) and (A.7) are necessary and sufficient 
conditions for a unique local minimum. 

We wish to relate the conditions of D,>0 and D^ = Q to the sensitivity 
coefficients. 

Let us define 


(A.8a) 

= nrx*, X* = j3,*^/x:^ (A.8b) 

Then using = 0 for all i, (A.3a) can be written 

j: S.;M3.^A(3/ (A.9) 

.=1 7=1 

where (A. 8a) and (A. 8b) are employed in S*, 

v=2i i (A.10) 

1/ = 1 U = l 


Now (A.9) is a quadratic form and if a unique minimum is to exist it is necessary 
that all the determinants 


T> + = 




•Si: 


, f 1 , 2 ,... ,/) 


(A.11) 


ISrt ■■■■ 

be greater than zero. Suppose that a minimum exists at j3* but that the minimum is 



484 


APPENDIX A IDENTIFIABILITY CONDITION 


not unique {or example it exists along a line ot in a plane This results in D* =0 
{or some r 

Suppose fust that the term in (A 10) involving X*j is negligible in its eontnbu 
lion to S* Notice that this lenn becomes negligible as the residuals y„ — ij* 
become small but this is not true for the terms (For linear in the parame 

ters cases is always aero) Furthermore assume that 

(A 12) 

and then S* given by (A 10) becomes 

s; » 2 (e- 'X* )(<r, 'X* ) (A 13) 

The summation in (A 13) can be considered to form an inner product involving 
vectors a i • 1 2 r 


o, 'X,* 
On 'V 


(AM) 


Use this interpretation in (A 13) and introduce (A 13) into (A 1 1) to get 



(A 15) 


which can be considered a Cram determinant of Uj 82 a It is known that D '' 
IS equal to zero if and only if the vectors a are linearly dependent which means 


C| 0 * 'Y*, + Cjfl* + + C o* '■X^ =« 0 (A 1 6a) 


C,Yt, + C2Y„+ +CY»»0 (A 16b) 

for /t=l 2 n and for not all C bemg equal to zero In other words 1 / the 
(^continuous) sensitiuty coejfiaenis are hneivfy dependent in the neighborhood of the 
minitrum there is no unique mimimm and all the r parameters cannot be simulta 
neously and uniquely estimated This is the desired relation Note however that this 
result assumes that the term involving X,* m (A 10) can be dropped is given 
by (A 12) there is no prior information and there are no parameter constramis 



A3 COMMENTS 


485 


Suppose (A. 16) is written in the form 

2 qA',t=0, r=\X-..,P (A.17) 

y=i 

where at least one Cj is not equal to zero. Also form the summation involving a row 
in (A.ll) 

J= I _/= 1 H= 1 V= I 

= 22 J 2 c^x^j (a. i s) 

u = l t) = ) [ y=> ./=' 

Differentiating (A. 17) with respect to ft yields 

2 C,A'4=0, u=l,...,w; /=!,.. .,p; r=\,...,p (A.19) 

y=i 

Using (A.17) and (A.19) in (A.18) then produces 

2 ft/C, = 0, r=],...,p; i=\,...,p (A.20) 

.^=1 

We have shown for linear dependence of the sensitivity coefficients, (A.17), that a 
given column of the square matrix D* given by (A.ll) can be considered to be a 
linear combination of the other columns. But if any column of a square matrix is a 
linear combination of the other columns, then the determinant of that matrix is 
zero. Consequently, the sum of squares function S does not have a unique 
minimum in the j8|,/l2, ...,-ft space and thus not all of these parameters can be 
uniquely determined. 

The results given above apply for r equal to 1 io p parameters. Note, however, if 
the linear dependence condition of the sensitivities given by (A.17) is satisfied for 
r<p, then it is also satisfied for r+ \,r + 2,...,p because Cr+|,C,.+2. can be 
set equal to zero in (A.17). 

A.3 COMMENTS 

(a) Parameters cannot all be uniquely estimated for tj being linear or nonlinear in 
the parameters if the sensitivity coefficients are linearly dependent over the range 
of the measurements. This is true if (i) the S function is formed by some weighted 
least square function, (ii) the sensitivities are continuous functions of the parame- 
ters, (iii) there is no prior information regarding the parameters, and (iv) there are 
no constraints on the parameters. 

(h) If the term in (A. 10) is negligible or is dropped, the determinant of 
is proportional to jX^WX] which must not be zero when attempting to estimate 



486 


APPENDIX A IDENTIFUBIUTY CONDITION 


parameters using the Gauss method discussed m Section 74 Hence using the 
Gauss method, none ol the parameters can be estimated if the sensitivities are 
linearly dependent Other methods might permit one to obtain certain parameters, 
but not all since there would be no umque inifumum of S if the conditions in (a) 
above are true 

(c) Regardless of the form of tV if |X^X(=*0 it is also true that |X^WX|«=0 
Hence if the sensitivities are linearly dependent, there is no choice of W possible 
that will cause |X^WX( to be not zero 

(d) If (X^X|=?^0, then jX^WX} may or may not be equal to zero But if ML 
estimation is used as mentioned m (h) (X^WX| would not be zero if (X^Xl^O 


A 4 RELATION TO EIGENVALUES 

The determinant D* is numerically equal to the product of its eigenvalues 
Ai Xj, ,X, Now the matrix on the right side of (A 1 1) is real and symmetnc which 
results in all the eigenvalues being real Also since 

K (A21) 

D* will be then equal zero if and only if ai least one of the A, values is equal to 
zero 

Since Si^ IS normalized so that the scale of ihe A values (or the choice of their 
units) IS unimportant the relative magnitudes of the A, values is significant If one 
value IS much smaller than the others (but not zero) (he X^VVX tnatctx is probably 
ill cocidUiotied and the mitumum of ^ is not welt defined This would occur when 
there IS “almosi* linear dependence of the sensitm^ coefficients In such cases 
there will be relatively large inaccuracy (large vanances) in the parameters 
Consider the case of two parameters and then the eigenvalues A, and Aj in 



Ai,A2“i[S|'T + Sj^4:[(5,'^ +S^)*-4A]''’} (A 23) 

where 4 is the determinant in (A.22) with A set equal to zero Let Aj be the smaller 
eigenvalue The the ratio of the eigenvalues Aj/Aj, is always between zero and one 
Xj/Aj IS given by 

A, l-(l-Q'^ 

^2 l + 


(A 24) 



REFERENCES 


487 


where 




4A 


(A.25) 


which is also limited to between zero and one. For small ^ it can be demonstrated 
that 


X. I 


(A.26) 


Only for |=1 does X 1 A 2 equal unity; | equals one only when S'j^ = 5'2t=0 and 

5,t = 52^ 

The above analysis suggests for more than two parameters that the criterion of 
small 


"" [p-*tr(X-%X-)]^ 


(A.27) 


could be used to see if there is near linear dependence of the sensitivity coefficients. 
(Note that if the X*j term in (A. 10) can be dropped, the components of 
(X‘^)^WX''' are given by When goes to zero, at least one eigenvalue is 

equal to zero; the maximum value of is unity. 

Thus in addition to plotting the sensitivity coefficients, one could examine ip to 
see if it is near zero. If it is, the experiment is poorly designed and one or more of 
the parameters should not be estimated, but rather certain groups of parameters. If 
possible, the experiment should be redesigned so that ip is not so small. However, 
the recommended criterion for accomplishing this is not ip, but rather the numera- 
tor of {A.ll), subject to certain constraints. See Chapter 8. 


REFERENCES 

1. G. S. G. Beveridge and R. S. Schechter, Optimization: Theory and Practice, McGraw-Hill 
Book Company, New York, 1970, p. 217. 



n tT •“ 
3 'x X 

2 >* II 

® c s 


r’ e 

I K 

5 ^x- 


c ^ > > 

■S O O 

w »-■ 

W X rt 


B. - k CU X 

X ^ I X 5 7 

< Pk ^ 5 Qh G 

-S 4> ^ 4> ^ 

Q. V- ' ' U. X 

. a> 1 4> “ 

S. > a. > 


Iz o W 16 ^5 O 


X a> j 
CQ ^ 


’S o s 

« o, 


489 


= cov(e) for assumptions denoted 11 . (See Section 6.1.5 or inside rear cover for list of standard assumptions). 



APPENDIX 


C. 


List of symbols 


ENGLISH SYMBOLS 


al 

A/i+l) 

AR 

b 

Cf, 

c 

cov( ) 
D 


e 

E( ) 

/() 

/(Y^) 

Jim 

F 

tipn-p) 

G 

h 

H 

H 


Coefficients in 5 see (76 3) 

First order AR errors 

Term used in sequential methods (6 7 8a) and (7 8 23a) 
Autoregressive 

Parameter vector esnmaied from estimation equation [pXl] 
Component of X^WX C (74 13) 

sX’^IVX {14l2)\pxp\ 

Covariance cov(/e B)^ B{[A - E{A)]\B~ 

M3tn;v used for dependent observations e=Du [nXn] 

Residual vccior=»Y V 

Eigenvector Section 68 

Expectation operator 

Probability density 

Probability density of \ given p 

Probability dens ly of p given Y 

Modified observation vector F=D 'Y where D comes from 
e=Du 

F statistic associated with (1— a > 100% confidence region and 
p and n — p degrees of freedom 
Related to the slope of S (7 6 8a) 

Acceleration factor (7 6 1) 

=X^V(Y-x,) (7415) 

(74 16 ) 

Subscript or superscript 


490 



ENGLISH SYMBOLS 


491 


J 

k 

h-a(P) 

L 

LS 

m 

MA 

MAP 

ML 

n 

P 

P{-) 

P 

^LS 

^ML 

^MAP 

Q 

R 



t 

h~a/i{n-p) 


^a/l 

F(.) 

V, 

V, 

H’.: 


X 


Subscript 

Subscript or superscript 
Coefficient for confidence region, Section 6.8 
Likelihood function. Sections 3.2.5 and 6.1.6 
Least squares 

Number of observations at a given time 
Moving average 
Maximum a posteriori 
Maximum likelihood 

Number of observations or number of observation times 

Number of parameters 

Probability 

Covariance matrix of estimators; [/?Xp] 

= (X^X)” 

= (X^;|/-'X)“‘ 

Quadratic form, A^OA, (6.1.27) 

Minimum S\ for LS, f? = (Y-Y)^(Y-Y) 

Estimated standard deviation of observation errors, j = 
where 5^=0^; for independent constant variance 
errors, s^ = R/{n — p) 

Sum of squares function, scalar 

Least squares sum of squares, (Y - ■>))^(Y - ij) 

Maximum likelihood sum of squares; for standard 
assumptions, (Y — ij)^j|<~'(Y— 17) 

MAP loss function; for standard MAP assumptions, 

SM^?=i^-vn~'i^-v)+ip-p-(iVyp\pip-P) 

Time 

t statistic associated with (1 - a) 100% confidence region 
and n—p degrees of freedom 
Random component for correlated errors. Appendix 6A. 
e = Du 

100(1 — a/2) percentage point of the normal distribution 
Variance operator; F(>4)= £| - £(/l)]^j 
Covariance matrix of b, [p X p] 

Covariance matrix of [p'Xp] is prior vector of /3) 
Component of weighting matrix W 
Weighting matrix; for ML, W = 4/”’, [nXn] 

Coordinate or independent variable 
Sensitivity matrix; X = (V^t]^)’". If rj is linear in the 
parameter as in i} = Xj3,(V^T}^)’' reduces to X. 



492 


APPENDIX C LIST OF SYMBOLS 


Y Observation vector, (nxl] 

Y Predicted vector of observations, (nxl], for linear case 

Y = Xb 

Z Modified sensitivity roainx, Z = D"’X. [nX/i] 


CREEK SYMBOLS 

a Associated with conlidence uiterval or regsoa percent confidence, see 
Section 6 8 

a See Section 7 6 for parameter related to reduang the interval for calculat- 
ing S'*'' 

P Parameter vector. (pX 1) 

n ) <jamma lunclion, see Section 68 

L Optimum experiment cnterion (see Chapter 8>, for standard assomptions 
of Y*i)+ 8, £(*)»0 * with normal density, known independent van 
able values, known within a multiplicative constant, we have 

e Error vector [nXl] usually Y«i|-i-r 

ij Expected value vector, regression vector, model vector, [rtx 1} 

0 Moving average pacameter 

A Eigenvalue Section d 8 

(•a Parameter vector known from prior mformanon (p x 1) 

P Autoregressive parameter 

P Correlation coefficient, (2 6 17) 

a Standard deviation of constant var»nce observation errors 

Constant variance of observation errors 
0,^ Variance of e,. y(t,)’=a^ 

Variance of u,, V(u)=o5 

£>/ Variance of used for the AR case designated al, e* = n*(l— p*)”* See 
below (699) 

Diagonal matrix, usually ^=£(iid^) for £(o)=0 and where £(i/,Wy)®0 for 
>¥‘J. [nxn) 

Chi-squared statistic 

Covariance matrix of the observation errors, for £(8)=0 \{i = £(£e^) 
n Known part of Vi, as in ^**0^0 where o* is unknown, [n x n) 

OTHER SYMBOLS 

V^{ ) Matrix derivative operator, V^=I3/3/?, 9/9i0,l 



appendix X-' 

Some estimation programs 


In Ihis appendix a tew computer programs are reterenced M y 
available. For additional references see Himmelblau [Dl, pp. i/u, i 
(D2, pp. 323, 324], and Kuester and Mize [D3]. 


others are 
203], Bard 


LINEAR ESTIMATION PROGRAMS 

LINFIT A linear least squares program with optional constramts to inake the 
parameters nonnegative, add to a constant, etc. This is one of eighteen s a is i 
routines written by J. R. Miller [D4]. 

LINREG A linear least squares program that is described in reference D3 where 
an example and the listing are given. 

OMNITAB A general purpose computer program for statistical and numerica 
analysis [D5]. 


NONLINEAR ESTIMATION PROGRAMS 

BARD A nonlinear least squares program that uses the Gauss method [D3, p. 
218]. 

BSOLVE A nonlinear least squares program that uses Marquardt’s method [D3]. 
NLIN IBM Share Program SD 3094 written by Marquardt and others. Written in 
FORTRAN IV for IBM 7040. Uses Marquardt’s method with derivatives or finite 
difference approximations to solve weighted least squares problems. 


493 



494 


APPENDIX D SOME ESTIMATION PROGRAMS 


NLINA This IS a program wniten at Michigw State University by J V Beck and 
available from him It uses the sequential and B<jx-Kanemasa modifications of the 
Gauss method 

SSQMIN This program uses the Powell procedure and is discussed m reference 
D3 


REFERENCES 


Dl Hinirntlblau D M Procttt Anafyta by Siaiisi eal Mtihods John Wilty & Sons Inc 
NtwYotk 1910 

D2 Bard Y Nonlinear Paramrier Esnmauon Academic Press Jnc New York 1974 
D3 Keusier J L and Mae J H Optimsanon Teehniques IViih Fortran McCraw Hill 
Book Co New York 1973 

D4 Miller J R On Line Anafytts Jor Social Scitmsii MACTR-40 Pro;eet MAC 
Massachusetts Institute of Technology Cambridge Mass 19^7 
D3 Hilsenrath i Ziegler C Messina C C Walsh P S and Herbold R OMNITAB A 
Computer Program for Stoiisiieal anA Numerical Analysis Nat Bur of Sid Handbook 
101 U S Government Pruning Office Washington D C 1966 Reissued Jan 1968 
with corrections 



Index 


Abbott, G. L., 475, 480 

Abramowitz, M., 77 

Al-Araji, S., 263, 319 

Analysis of covariance, 131 

Analysis of variance, 130, 131, 175, 178 

Aris, R., 474 

Arkin, H., 78 

Assumptions, Gauss-Markov, 134, 232 
standard, 134, 228, 229 
violation of, 185—204, 290-319, 393, 
400,401,459,460 
Atkinson, A. C., 435, 438, 439, 474 
Autocovariance, 59 

Bacon, D. W., 379,414 
Badavas, P. C., 432, 474 
Bard, Y., 4, 24, 335, 362, 364, 375, 386, 
411,414,472, 475,493,494 
Bayesian estimation, 97-101. See also 
Maximum a posteriori estimation 
Bayes’s theorem, 46, 47, 160, 164, 270 
Beale, E. M. L., 414 

Beck, J. V., 263, 319, 415, 474, 475, 494 
Beveridge, G. S. G., 338, 414, 487 
Bevington, P. R., 24 
Beyer, W. H., 78, 319 


Bias, 89 
Bias error, 1 80 
Bonacina, C., 474 
Booth, G.W., 414 

Box, G. E. P., 24, 114, 129, 162, 204, 229, 
232, 319, 359, 363, 364, 369-376, 
380, 386, 414, 415, 419, 432, 438, 
439,469,470,474, 475 
Box, M. J., 4 

Box-Kanemasu interpolation method, 
362-377,387,494 
Box-Muller transformation, 126 
Brownlee, K. A., 204, 226 
Bryson, A. E., Jr., 24 
Burington, R. S., 78, 204, 319, 415 
Butler, C. P., 475, 480 

Cannon, J. R., 474 
Carslaw, H. S., 474 
Central limit theorem, 64, 67, 186 
Chebyshev’s inequality, 62 
Chi-squared test, 268, 269 
Cochran’s theorem, 176 
Coefficient of multiple determination, 
173-175 

Colored errors, see Errors, correlated 


495 



INDEX 


Colton R R 78 
Comim G 474 
Computer programs 493 494 
Confidence interval 102-108 290 380 
381 

approximate 380-186 
matrix formulation 290 
mean 102 106 
points on regression line 184 
standard deviation lOS 
Confidence region 300 301 380-386 
ill-determined 383 

known ttiM covariance matrix 290 298 
likelihood ratio 383 383 386 
matrix formulation 290-301 
minimum 419 
nonlinear 378-386 
probabilities of 294 
o’ unknown 299 301 
Consistency 90 186 
Corielation eoeffieient 66 67 
CorreUtion niatru approximate 379 38o 
Cost 123 
Covatianee 66 67 
Covgrunce matrix 120 222 
autoregressive errors 322 
least aijuares 238 240 439 
maximum a posteiioci 272 489 
maximum likelihood 239 4S9 
minimum 239 
parameters 4S2 
approximate 378 379 
for predicted points on MAP regression 
line 489 

on ML regression line 260 489 
forOLS 239 489 

Covariance matrix of errors uncertainty of 
273 274 

Ciamei Rao or Ciamei Frechtt Rao tower 
bound 91 433 
Cross covariance 59 

Daniels C 226 319 
Data see Measurements 
Davies. M 376 377 414 
Degrees of freedom 73 73 76 176 
Density function probability 37 
Dependence linear 22 
Dependent events 44 
Design experimental see 


Experiments optimal 
Determinant 215-217 219 
DeutscH,R 6 24 319 
Digital data acquisition 2 32 419 
Ducnilunation 8 464 473 
based on information theory 467-470 
likelihood ratio test 472 
(eriniiutioit criteria 470 473 
Distiibution Bernoulli 65 
binoRuat 66 
bivariate 39 

Chssiimiti 73 74firWe7 
Conditional 43 
Exponential 73 
F 76 77 (table) 

Gamma 72 
tnatguial 49 
irmltivanate 39 
noiunformauve pnot 98 
noiinal 67 70 (table) 154 230 
muluvanate 71 230 231 
OLSestimsCor 241 
OLS residual sum of squares 241 
Poisson 66 
posterior 97 

pnor 97 See also Wosmtuon prior 
probability 36 
I 75 76 (table) 
uniform 67 
variance 73 

Dntnbutwm function 37 
Draper N R 4 204 226 229 232 319 
415 

EfricteiKy 91 186 
Eigenvalue see Matrix eigenvalues 
Elsenhart C 320 
Error function 293 
complementary 400 44$ 446 4S0 
Errors additive 118 134 228 
autoregressive 191 192 229 303 312 
314 320-325 460 
first order 303 
moving average I9l 
seeondotder 320 325 
specaai cases 305 324 325 
constant variance 134 228 
Ttolationof 188 190 459 460 
correlated 190 192 393 400 401 460 
matrix analysis 301 325 



497 


INDEX 

cumulative, 303,305.306,408 
measurement, 7, 132 
normal, 230 

moving average, 191, 312—314 

nonconstant variances, 260, 261 

process, 133 

standard assumptions, 134, 228, 229 
uncorrelated, 134, 228 
zero mean, 134, 228 
violation, 185, 186 
Estimate, see Estimator; Estimation 
Estimation, comparison of nonlinear 
methods, 371—377 

involving ordinary differential equations, 
350-361 

nonlinear, 334-410 
optimal, see Experiments, optimal 
physical and statistical parameters, 
315-319 

sequential, 275-289 
matrix inversion lemma, 391-393 
multiresponse, 387-393 
nonlinear, 387-410 
state, 6, 288, 289 

see also Gauss-Markov assumptions. Least 
squares estimation; Maximum 
a posteriori estimation; Maximum 
likelihood estimation 
Estimation programs, linear, 493 
nonlinear, 493, 494 
Estimator, 84 
properties of, 89-101 
table of for simple models, 152, 153 
unbiased, 232 

see also Bayes estimation; Least squares 
estimation; Maximum a posteriori 
estimation; Maximum likelihood 
estimation 
Event, 32, 33 
disjoint, 33, 34 
independent, 44 
Expected value, 51, 55 
Expected value matrix, 120, 222 
Experiments, 32, 33 
factorial, 252-259 
optimal, 6, 7, 14, 18, 149, 419—463 
attainable region, 435, 436, 437 
constraints, 419, 420, 426, 427, 
435-438, 451,455,457,458 
criteria, 422, 432-434, 475-477 


equally-spaced measurements, 421, 422, 

440-443, 444-446, 458, 459 

multiresponse cases, 434 

not all parameters of interest, 461 463, 
477,478 

one-parameter cases, 420-432 
operability region, 433, 436, 437 
same number of measurements as 
parameter, 434-440 
simplex, 435 

Factorial design, 253, 255 
Factors, 253 
coded, 254 
qualitative, 252 
quantitative, 252 
Farnia, K., 415 
Fedotov, 419, 432, 472, 474 


Filter, 277 
Kalman, 289 

Finite differences, 16, 334, 410, 411 


■isher, R- A., 78 


383, 386 

F test, 242, 243, 244, 263, 387. See also 
Model building 


Gain matrix, 277 • 

Gallant, A. R., 370, 371, 380, 385, 414 


Gauss, K. F., 24 

Gauss estimator, 341 

Gaussian distribution, see Distribution, 


normal 

Gauss-Markov assumptions, 134, 232 
estimation, 121, 489 
nonlinear, 346, 389 
sequential, 277 
theorem, 232—234 
Gauss method, 340—349 
modifications to, 363—378 
Gauss-Newton method, see Gauss method 
Ghosh, B. K., 472, 475 
Goldfeld, S. M.,242, 319 
Grashof number, 329 
Graupe, D., 5, 24, 414 
Gtaybill, F. A., 475, 477 
Guttman, F., 414 


Hald, A., 78 
Hammersley, 1. M., 129 



INDEX 


Handscomb D C 129 
Ha/tiey H O 28 364 414 
Heat ttansfer coefficient 246 339 
conduction 227 263 332 400 410 
semi infinite body 400 401 44S-4S3 
convection 143 236 238 328 329 
coobflgbiUet 243-147 357-361 
397-399 443 444 
multiiesponse data 402 404 
Helneken F G 457 
Henson T L 414 475 
Hetbold R 494 
Hildebiand F B 217 248 319 
Hill W J 469 470 475 477 
Hilsenrath ] 494 
Himmelblau D M 319 493 494 
Ho Yu On 24 
Howl A E 187 320 
Homoskedastieity tee Constant vacance 
etiots 

ItuntcT J S 4 414 

HuntM W C 4 232 386 415 419 435 
438 439 474 475 477 
Hypothetit, tuiU 112 177 
simple 109 
testing, 108 113 

Identifiability 4 13 17 l9 23 228 
346 481 497 
Idenufication 5 8 

111 conditioned pfoblem 287 335 371 
379 380 382 486 487 
Independence 44 

Independent variables errorless 134 229 
errors m 192 204 
nonStocliastic 134 229 
Information for disuiniination 468 
priof 97 134 229 
subjective 159 162-165 269 
eiitmation with 272 273 
prior 285 

theory of 467 469 476 
InvarSatvt embedding 371 


Jacobian 220 
Jaeger J C 474 
Jenkins G M 24 
Jenkins R.J 475 480 
Jones A 376 414 


Kaaemiau H 363 364 369 370-374 

414 

Keruurd R W 287 320 
Klein R.£ 474 
Klimko E M 474 
Xhoe S J 59 77 
Koienta J 5 24 
KreiCh F 24 204 
Kuester J L. 493 494 
Kuttback S 468 475 

Lackoffit lS4 5eee/{o Sum of squares 
lack of Gt 

Lagrange fflultipber 194 
methodof 192 194 
Laptdus L S 414 457 474 
Uw of large numbers 63 
Least squares estimation 2 4 10 23 120 
135 153 489 

autoregressive errors 306 308 
malri't form 234 248 
ordinary see Least squares estimation 
sequential 277 
unbiased 238 
weighted 247 248 
Legendre A M 24 
lAvenbeti K 362 368-370 414 
methodof 363 370 
modified method 370 
Uww T O 24 
Likelihood function 230 
Likelihood ratio tests 112 
Linear estimation algebraic formulation 
130 204 

matiix fcrmulabon 213-319 
Linear model interaction terms 255 
matrix form 225 
Lae bkehhood function 230 
Lucas H L 419 4 3 2 438 439 4 74 4 76 

McOintock F A 77 
McCormack D 3 432 474 
HAP estimates see Maximum a posteriori 
esUmaUon 

Masquardt 0 W 187 320 362 370 371 

414 493 

Maiquardtmethod 370 371 373 
Matrices 213 219 
product of 214 

Matrix covariance see Covariance matrix 



INDEX 


499 


diagonal, 216 

eigenvalues, 218, 219, 287, 291, 292, 
294-296, 476, 486 
gain, 277 

idempotent, 214, 240 
identity, 216 

inverse, 215—218, 327, 328 
inversion lemma, 277, 326, 327 
negative definite, 219 
negative semidefinite, 219 
nonsingular, 215 
nuU, 218 
partitioned, 218 
determinant, 218 
inverse, 218 

positive definite, 218, 219 
positive semidefinite, 219 
rectangular, 214 
square, 214 
symmetric, 214 
trace of, 219 
transpose, 215 
Matrix calculus, 219-221 
Matrix derivative, 219, 220 
Maximum a posteriori estimation, 98, 122, 
159-167, 208, 271,333, 489 
matrix form, 269-274 
nonlinear, 346 
random parameters, 159 
sequential, 277, 284 
subjective prior information, 159 
Maximum likelihood, covariance matrix of 
parameters, 259 
estimate of a^, 157, 158 
Maximum likelihood estimation, 94, 122, 
154-159, 259-269, 489 
autoregressive enors, 308-312 
matrix formulation, 259—269 
nonlinear, 389 
using prior information, 158 
sequential, 277 
sum of squares function, 230 
May, D. C, Jr., 78, 204, 415 
Mean, 86, 124 

Measurements, continuous, 339 
expected value, 151 
multiresponse, 226-228, 231, 232 
predicted value, 151 
repeated, 167-173, 181, 258 
smoothed, 277 


see also Errors 
Median, 85, 124, 188 
Meeter, D. A., 414 
Melsa, J. L., 5, 24 
Mendel, J. M., 5, 24, 277, 320 
Messina, C. G., 494 
Miller, J. R., 493, 494 
Minimum expected squared deviation 
estimation, 93 

Minimum variance unbiased estimators, 92, 
188, 232 

Mize, J. H., 493, 494 
Mode, 124 
Model, 4, 117 
incorrect, 180, 181 
linear, algebraic, 8, 131, 225-228 
in parameters, 18 
restrictions, 132 
mechanistic, 359 

nonlinear in parameters, 13, 15, 16, IS, 
19,334, 342, 343,347,351,352, 

357, 358,367,372,376, 381,385, 
397,400,406,411-413 
probabilistic, 84 
simple linear, 130, 131 
Model building, 178, 386, 387. See also 
Discrimination; F-test 

Monte Carlo, examples, 125, 317-319, 382, 
400, 401 
methods, 125 
Moody chart, 211 
Muller, M. E., 129 
Myers, G. E., 475,478 
Myers, R. H., 24, 319 

Nahi, N. E., 432, 474 
Newton-Gauss, see Gauss method 
Normal density function, see Distribution, 
normal 

Normal equations, 136, 235 
Normality, standard assumption, 134, 229 
standard assumption, violation, 186-188 
Nusselt number, 236-238 

Observation, 32. See also Errors 
Odell, P. L., 24 

Ordinary least squares, see Least squares 
estimation 
Outcome, 32, 33 
Owen, D. B., 77 



INDEX 


jOO 

Paiainetets constant, 229 
Tiontandom 1J4 
random 134 208 229 
vector 270 
Parker W J 475 480 
Parsimony 4 247 257 2«3 361 
Partial differential equation of conduction 
optima experiments 444 459 
Peaison E S 78 
Perils H J 432 474 
Peterson T 1 414 
Polynomials orthogonal 248-252 
Power 114 
Prandtl number 236 
Predicted values 136 
Prior see Distribution prior Infoimaiion 
prior 

Probability 32 33 
Probabilities conditional 43 
Property 2 

Pseudorandom numbers 126 

Quadratic form 221 
expected value 224 
matnx derivative of 221 
Quandt R E 242 319 
Quasi linearization 371 


Rabinowica £. 10 24 
Randomness 29 
Random numbers 126 147 
EUtvdom vanabVe 32 33 
continuous 36 
discrete 36 
functions of 48 
Regression analysis 130 131 
Regression function 131 
Repeated data ree ileasurements repeated 
Residuals 11 136 188 301 302 
relative 211 
signatures 458 
sum of 145 

Reynolds number 145 211 236-238 
Rice J R 319 
Ridge analysis 287 
Ridge regression estimation 287 289 
Ruedenburg K 364 
Runs experimental 253 
TOJTnbMof 303 


Sage A P 24 
Sample path 42 
Sample space 32 33 
continuous 35 
deoutnerably infinite 34 
diictete 34 
finite 34 

Satidi G W 432 474 
SchecWei R S 338 414 487 
Search comparison of 371 
direct 337 

dynamic programming 33S 
exhaustive 336 337 
Fibonacci 337 
Gauss lee Gauss method 
gradient see Gauss method 
haJving-doubling method 375 
Hooke-leeves 338 
bneanzation See Gauss method 
random 337 
simplex 338 

trial and error approach 15 335 336 
Seinfeld 1 H 4 414 4S7 474 47S 
Sensiuviiy 14 

Sensitivity coerTicient 4 17 18 22 228 
358 406 410 413 446 448 450 
453 455 481 

finite difference evaluation 410 411 
linear d^endence 349 
Sensitivity equation 19 4U 413 
Sensitivity matrix 225 226 340 
Sequential estimation multireiponse 286 
Sequential method advantages 283 283 
289 

Sequential optimization 460 461 
Signiflcance level of 112 
Significant linear regiession, 184 
Shaiuion 476 

Smith H 24 204 226 319 415 
Smith K 432 474 
Smooth values 136 
Splines 252 

Squared error loss estimators 122 
Standard deviation 56 137 
Standard errors estimated 137 
State variable 2 
Statulcc 84 

Steepest descent method of 369 

Stegun 1 A 77 

Stftchisatic ipptoxiniatKjn 371 



INDEX 


501 


Studden, W. J., 474 
Sufficiency, 50 
Sufficient statistic, 93 
Sum of squares, contours, 347, 348 
error, 173, 175, 178 
lack of fit, 178 
least squares, 10, 14 
maximum a posteriori, 270 
maximum likelihood residual, 267 
minimization for nonlinear models, 
334-410 
pure error, 178 
regression, 173, 175 
residuals, 240, 241 
total, 173, 175 
Swed, F. S., 320 

Taylor series, matrix form, 338 
Tiao, A. C., 114, 162, 204 
Tsuchiya, H. M., 474 

Unbiased estimator, for a^, 139, 141, 241 
matrix form, 263 
Unbiasedness, 89 
Uncertainty, measure of, 476 


Union, 34 

Van Fossen, G. J., Jr., 415, 457, 474, 475 
Variance, 56, 57 
.estimation of, 87 

Variance-covariance matrix, see Covariance 
matrix 

Variance error, 181 
Variate, continuous, 31 
discrete, 31 

Variation, coefficient of, 56 
Vector, column, 213 

Wald, A., 472, 475 
Walsh, P. J., 494 

Weighted least squares, sequential 
estimation, 277 
Welty, J. R., 236, 319 
Whitting, 1. J., 376, 377, 414 
Wolberg, J. R., 24 
Wood, F. S., 226, 319 

Yates, F., 78 

Ziegler, G., 494 



